WorldWideScience

Sample records for multiple processor architectures

  1. Array processor architecture

    Science.gov (United States)

    Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

    1983-01-01

    A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

  2. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

    OpenAIRE

    Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

    2015-01-01

    Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware ...

  3. HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

    Science.gov (United States)

    van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten

    2009-12-01

    We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the

  4. Multiple Embedded Processors for Fault-Tolerant Computing

    Science.gov (United States)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  5. An orthogonal wavelet division multiple-access processor architecture for LTE-advanced wireless/radio-over-fiber systems over heterogeneous networks

    Science.gov (United States)

    Mahapatra, Chinmaya; Leung, Victor CM; Stouraitis, Thanos

    2014-12-01

    The increase in internet traffic, number of users, and availability of mobile devices poses a challenge to wireless technologies. In long-term evolution (LTE) advanced system, heterogeneous networks (HetNet) using centralized coordinated multipoint (CoMP) transmitting radio over optical fibers (LTE A-ROF) have provided a feasible way of satisfying user demands. In this paper, an orthogonal wavelet division multiple-access (OWDMA) processor architecture is proposed, which is shown to be better suited to LTE advanced systems as compared to orthogonal frequency division multiple access (OFDMA) as in LTE systems 3GPP rel.8 (3GPP, http://www.3gpp.org/DynaReport/36300.htm). ROF systems are a viable alternative to satisfy large data demands; hence, the performance in ROF systems is also evaluated. To validate the architecture, the circuit is designed and synthesized on a Xilinx vertex-6 field-programmable gate array (FPGA). The synthesis results show that the circuit performs with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for transform size of 512. A pipelined version of the architecture reduces the power consumption by approximately 89%. We compare our architecture with similar available architectures for resource utilization and timing and provide performance comparison with OFDMA systems for various quality metrics of communication systems. The OWDMA architecture is found to perform better than OFDMA for bit error rate (BER) performance versus signal-to-noise ratio (SNR) in wireless channel as well as ROF media. It also gives higher throughput and mitigates the bad effect of peak-to-average-power ratio (PAPR).

  6. Optical linear algebra processors - Architectures and algorithms

    Science.gov (United States)

    Casasent, David

    1986-01-01

    Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.

  7. Architectural design and analysis of a programmable image processor

    International Nuclear Information System (INIS)

    Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

    2003-01-01

    In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)

  8. Novel memory architecture for video signal processor

    Science.gov (United States)

    Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei

    1993-11-01

    An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.

  9. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    Science.gov (United States)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  10. Reducing Competitive Cache Misses in Modern Processor Architectures

    OpenAIRE

    Prisagjanec, Milcho; Mitrevski, Pece

    2017-01-01

    The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably, the development of modern processor architectures leads to an increased number of cache misses. In this paper, we make an attempt to implement a technique for decreasing the number of competitive cache misses in the first level of cache memory. This tec...

  11. Considerations for control system software verification and validation specific to implementations using distributed processor architectures

    International Nuclear Information System (INIS)

    Munro, J.K. Jr.

    1993-01-01

    Until recently, digital control systems have been implemented on centralized processing systems to function in one of several ways: (1) as a single processor control system; (2) as a supervisor at the top of a hierarchical network of multiple processors; or (3) in a client-server mode. Each of these architectures uses a very different set of communication protocols. The latter two architectures also belong to the category of distributed control systems. Distributed control systems can have a central focus, as in the cases just cited, or be quite decentralized in a loosely coupled, shared responsibility arrangement. This last architecture is analogous to autonomous hosts on a local area network. Each of the architectures identified above will have a different set of architecture-associated issues to be addressed in the verification and validation activities during software development. This paper summarizes results of efforts to identify, describe, contrast, and compare these issues

  12. Processor-in-memory-and-storage architecture

    Science.gov (United States)

    DeBenedictis, Erik

    2018-01-02

    A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code word is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.

  13. Scalable architecture for a room temperature solid-state quantum information processor.

    Science.gov (United States)

    Yao, N Y; Jiang, L; Gorshkov, A V; Maurer, P C; Giedke, G; Cirac, J I; Lukin, M D

    2012-04-24

    The realization of a scalable quantum information processor has emerged over the past decade as one of the central challenges at the interface of fundamental science and engineering. Here we propose and analyse an architecture for a scalable, solid-state quantum information processor capable of operating at room temperature. Our approach is based on recent experimental advances involving nitrogen-vacancy colour centres in diamond. In particular, we demonstrate that the multiple challenges associated with operation at ambient temperature, individual addressing at the nanoscale, strong qubit coupling, robustness against disorder and low decoherence rates can be simultaneously achieved under realistic, experimentally relevant conditions. The architecture uses a novel approach to quantum information transfer and includes a hierarchy of control at successive length scales. Moreover, it alleviates the stringent constraints currently limiting the realization of scalable quantum processors and will provide fundamental insights into the physics of non-equilibrium many-body quantum systems.

  14. Optical chirp z-transform processor with a simplified architecture.

    Science.gov (United States)

    Ngo, Nam Quoc

    2014-12-29

    Using a simplified chirp z-transform (CZT) algorithm based on the discrete-time convolution method, this paper presents the synthesis of a simplified architecture of a reconfigurable optical chirp z-transform (OCZT) processor based on the silica-based planar lightwave circuit (PLC) technology. In the simplified architecture of the reconfigurable OCZT, the required number of optical components is small and there are no waveguide crossings which make fabrication easy. The design of a novel type of optical discrete Fourier transform (ODFT) processor as a special case of the synthesized OCZT is then presented to demonstrate its effectiveness. The designed ODFT can be potentially used as an optical demultiplexer at the receiver of an optical fiber orthogonal frequency division multiplexing (OFDM) transmission system.

  15. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  16. FPGA Based Intelligent Co-operative Processor in Memory Architecture

    DEFF Research Database (Denmark)

    Ahmed, Zaki; Sotudeh, Reza; Hussain, Dil Muhammad Akbar

    2011-01-01

    benefits of PIM, a concept of Co-operative Intelligent Memory (CIM) was developed by the intelligent system group of University of Hertfordshire, based on the previously developed Co-operative Pseudo Intelligent Memory (CPIM). This paper provides an overview on previous works (CPIM, CIM) and realization......In a continuing effort to improve computer system performance, Processor-In-Memory (PIM) architecture has emerged as an alternative solution. PIM architecture incorporates computational units and control logic directly on the memory to provide immediate access to the data. To exploit the potential...

  17. Advanced Avionics and Processor Systems for a Flexible Space Exploration Architecture

    Science.gov (United States)

    Keys, Andrew S.; Adams, James H.; Smith, Leigh M.; Johnson, Michael A.; Cressler, John D.

    2010-01-01

    The Advanced Avionics and Processor Systems (AAPS) project, formerly known as the Radiation Hardened Electronics for Space Environments (RHESE) project, endeavors to develop advanced avionic and processor technologies anticipated to be used by NASA s currently evolving space exploration architectures. The AAPS project is a part of the Exploration Technology Development Program, which funds an entire suite of technologies that are aimed at enabling NASA s ability to explore beyond low earth orbit. NASA s Marshall Space Flight Center (MSFC) manages the AAPS project. AAPS uses a broad-scoped approach to developing avionic and processor systems. Investment areas include advanced electronic designs and technologies capable of providing environmental hardness, reconfigurable computing techniques, software tools for radiation effects assessment, and radiation environment modeling tools. Near-term emphasis within the multiple AAPS tasks focuses on developing prototype components using semiconductor processes and materials (such as Silicon-Germanium (SiGe)) to enhance a device s tolerance to radiation events and low temperature environments. As the SiGe technology will culminate in a delivered prototype this fiscal year, the project emphasis shifts its focus to developing low-power, high efficiency total processor hardening techniques. In addition to processor development, the project endeavors to demonstrate techniques applicable to reconfigurable computing and partially reconfigurable Field Programmable Gate Arrays (FPGAs). This capability enables avionic architectures the ability to develop FPGA-based, radiation tolerant processor boards that can serve in multiple physical locations throughout the spacecraft and perform multiple functions during the course of the mission. The individual tasks that comprise AAPS are diverse, yet united in the common endeavor to develop electronics capable of operating within the harsh environment of space. Specifically, the AAPS tasks for

  18. Clock generators for SOC processors circuits and architectures

    CERN Document Server

    Fahim, Amr

    2004-01-01

    This book explores the design of fully-integrated frequency synthesizers suitable for system-on-a-chip (SOC) processors. The text takes a more global design perspective in jointly examining the design space at the circuit level as well as at the architectural level. The comprehensive coverage includes summary chapters on circuit theory as well as feedback control theory relevant to the operation of phase locked loops (PLLs). On the circuit level, the discussion includes low-voltage analog design in deep submicron digital CMOS processes, effects of supply noise, substrate noise, as well device noise. On the architectural level, the discussion includes PLL analysis using continuous-time as well as discrete-time models, linear and nonlinear effects of PLL performance, and detailed analysis of locking behavior. The book provides numerous real world applications, as well as practical rules-of-thumb for modern designers to use at the system, architectural, as well as the circuit level.

  19. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    Science.gov (United States)

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  20. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    Science.gov (United States)

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  1. FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)

  2. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  3. Introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1985-04-01

    FORTRAN applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The latter allows a single job to use more than one processor simultaneously, with a consequent reduction in wall-clock time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concepts of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreproducible results, which makes debugging more difficult

  4. Array processors: an introduction to their architecture, software, and applications in nuclear medicine

    International Nuclear Information System (INIS)

    King, M.A.; Doherty, P.W.; Rosenberg, R.J.; Cool, S.L.

    1983-01-01

    Array processors are ''number crunchers'' that dramatically enhance the processing power of nuclear medicine computer systems for applicatons dealing with the repetitive operations involved in digital image processing of large segments of data. The general architecture and the programming of array processors are introduced, along with some applications of array processors to the reconstruction of emission tomographic images, digital image enhancement, and functional image formation

  5. Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

    International Nuclear Information System (INIS)

    Collette, Thierry

    1992-01-01

    Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr

  6. The architecture of a video image processor for the space station

    Science.gov (United States)

    Yalamanchili, S.; Lee, D.; Fritze, K.; Carpenter, T.; Hoyme, K.; Murray, N.

    1987-01-01

    The architecture of a video image processor for space station applications is described. The architecture was derived from a study of the requirements of algorithms that are necessary to produce the desired functionality of many of these applications. Architectural options were selected based on a simulation of the execution of these algorithms on various architectural organizations. A great deal of emphasis was placed on the ability of the system to evolve and grow over the lifetime of the space station. The result is a hierarchical parallel architecture that is characterized by high level language programmability, modularity, extensibility and can meet the required performance goals.

  7. Tinuso: A processor architecture for a multi-core hardware simulation platform

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; Karlsson, Sven

    2010-01-01

    Multi-core systems have the potential to improve performance, energy and cost properties of embedded systems but also require new design methods and tools to take advantage of the new architectures. Due to the limited accuracy and performance of pure software simulators, we are working on a cycle...... accurate hardware simulation platform. We have developed the Tinuso processor architecture for this platform. Tinuso is a processor architecture optimized for FPGA implementation. The instruction set makes use of predicated instructions and supports C/C++ and assembly language programming. It is designed...... to be easy extendable to maintain the exibility required for the research on multi-core systems. Tinuso contains a co-processor interface to connect to a network interface. This interface allow for communication over an on-chip network. A clock frequency estimation study on a deeply pipelined Tinuso...

  8. ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

    OpenAIRE

    Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

    2005-01-01

    The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...

  9. Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

    2015-01-01

    A comparative analysis has been made to describe the potentialities of hardware and software tools of two most widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU architectures are exemplified by fragments of GPGPU programs. Time consumption for the program development has been estimated. Some pieces of advice are given as to the optimum choice of the GPU type for speeding up the processing of scientific research results. Recommendations are formulated for the use of software tools that reduce the time of GPGPU application programming for the given types of graphic processors

  10. A scalable single-chip multi-processor architecture with on-chip RTOS kernel

    NARCIS (Netherlands)

    Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

    2003-01-01

    Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a

  11. Extending and implementing the Self-adaptive Virtual Processor for distributed memory architectures

    NARCIS (Netherlands)

    van Tol, M.W.; Koivisto, J.

    2011-01-01

    Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current

  12. Reversible machine code and its abstract processor architecture

    DEFF Research Database (Denmark)

    Axelsen, Holger Bock; Glück, Robert; Yokoyama, Tetsuo

    2007-01-01

    A reversible abstract machine architecture and its reversible machine code are presented and formalized. For machine code to be reversible, both the underlying control logic and each instruction must be reversible. A general class of machine instruction sets was proven to be reversible, building...

  13. Adaptive Optoelectronic Eyes: Hybrid Sensor/Processor Architectures

    Science.gov (United States)

    2006-11-13

    large arrays of GaAs multiple quantum well (MQW) modulator-arrays to CMOS circuits [ Goossen , 1995]. By using a relatively simple flip-chip bonding...WPAFB and developed interactions with the Army Research Laboratory (Dr. Richard Leavitt) in the context of IR detectors. Furthermore, Prof. Madhukar was

  14. Examining the volume efficiency of the cortical architecture in a multi-processor network model.

    Science.gov (United States)

    Ruppin, E; Schwartz, E L; Yeshurun, Y

    1993-01-01

    The convoluted form of the sheet-like mammalian cortex naturally raises the question whether there is a simple geometrical reason for the prevalence of cortical architecture in the brains of higher vertebrates. Addressing this question, we present a formal analysis of the volume occupied by a massively connected network or processors (neurons) and then consider the pertaining cortical data. Three gross macroscopic features of cortical organization are examined: the segregation of white and gray matter, the circumferential organization of the gray matter around the white matter, and the folded cortical structure. Our results testify to the efficiency of cortical architecture.

  15. A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors

    Directory of Open Access Journals (Sweden)

    Shoaib Akram

    2010-01-01

    Full Text Available Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improved multicore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.

  16. Optimizing Vector-Quantization Processor Architecture for Intelligent Query-Search Applications

    Science.gov (United States)

    Xu, Huaiyu; Mita, Yoshio; Shibata, Tadashi

    2002-04-01

    The architecture of a very large scale integration (VLSI) vector-quantization processor (VQP) has been optimized to develop a general-purpose intelligent query-search agent. The agent performs a similarity-based search in a large-volume database. Although similarity-based search processing is computationally very expensive, latency-free searches have become possible due to the highly parallel maximum-likelihood search architecture of the VQP chip. Three architectures of the VQP chip have been studied and their performances are compared. In order to give reasonable searching results according to the different policies, the concept of penalty function has been introduced into the VQP. An E-commerce real-estate agency system has been developed using the VQP chip implemented in a field-programmable gate array (FPGA) and the effectiveness of such an agency system has been demonstrated.

  17. Real time image synthesis on a SIMD linear array processor: algorithms and architectures

    International Nuclear Information System (INIS)

    Letellier, Laurent

    1993-01-01

    Nowadays, image synthesis has become a widely used technique. The impressive computing power required for real time applications necessitates the use of parallel architectures. In this context, we evaluate an SIMD linear parallel architecture, SYMPATI2, dedicated to image processing. The objective of this study is to propose a cost-effective graphics accelerator relying on SYMPATI2's modular and programmable structure. The parallelization of basic image synthesis algorithms on SYMPATI2 enables us to determine its limits in this application field. These limits lead us to evaluate a new structure with a fast intercommunication network between processors, but processors have to support the message consistency, which brings about a strong decrease in performance. To solve this problem, we suggest a simple network whose access priorities are represented by tokens. The simulations of this new architecture indicate that the SIMD mode causes a drastic cut in parallelism. To cope with this drawback, we propose a context switching procedure which reduces the SIMD rigidity and increases the parallelism rate significantly. Then, the graphics accelerator we propose is compared with existing graphics workstations. This comparison indicates that our structure, which is able to accelerate both image synthesis and image processing, is competitive and well-suited for multimedia applications. (author) [fr

  18. Heterogeneous reconfigurable processors for real-time baseband processing from algorithm to architecture

    CERN Document Server

    Zhang, Chenxin; Öwall, Viktor

    2016-01-01

    This book focuses on domain-specific heterogeneous reconfigurable architectures, demonstrating for readers a computing platform which is flexible enough to support multiple standards, multiple modes, and multiple algorithms. The content is multi-disciplinary, covering areas of wireless communication, computing architecture, and circuit design. The platform described provides real-time processing capability with reasonable implementation cost, achieving balanced trade-offs among flexibility, performance, and hardware costs. The authors discuss efficient design methods for wireless communication processing platforms, from both an algorithm and architecture design perspective. Coverage also includes computing platforms for different wireless technologies and standards, including MIMO, OFDM, Massive MIMO, DVB, WLAN, LTE/LTE-A, and 5G. •Discusses reconfigurable architectures, including hardware building blocks such as processing elements, memory sub-systems, Network-on-Chip (NoC), and dynamic hardware reconfigur...

  19. Probabilistic programmable quantum processors with multiple copies of program states

    International Nuclear Information System (INIS)

    Brazier, Adam; Buzek, Vladimir; Knight, Peter L.

    2005-01-01

    We examine the execution of general U(1) transformations on programmable quantum processors. We show that, with only the minimal assumption of availability of copies of the 1-qubit program state, the apparent advantage of existing schemes proposed by G. Vidal et al. [Phys. Rev. Lett. 88, 047905 (2002)] and M. Hillery et al. [Phys. Rev. A 65, 022301 (2003)] to execute a general U(1) transformation with greater probability using complex program states appears not to hold

  20. An introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1986-01-01

    Fortran applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The later allows a single job to use more than one processor simultaneously, with a consequent reduction in elapsed time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concept of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreducible results, which makes debugging more difficult

  1. A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs).

    Science.gov (United States)

    Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo

    2018-02-01

    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.

  2. Design and evaluation of an architecture for a digital signal processor for instrumentation applications

    Science.gov (United States)

    Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

    1990-03-01

    The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.

  3. Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

    Science.gov (United States)

    Chen, Paul Peichuan

    1993-01-01

    Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.

  4. Soft-core dataflow processor architecture optimised for radar signal processing: Article

    CSIR Research Space (South Africa)

    Broich, R

    2014-10-01

    Full Text Available Current radar signal processors lack either performance or flexibility. Custom soft-core processors exhibit potential in high-performance signal processing applications, yet remain relatively unexplored in research literature. In this paper, we use...

  5. mAgic-FPU and MADE: A customizable VLIW core and the modular VLIW processor architecture description environment

    Science.gov (United States)

    Paolucci, Pier S.; Kajfasz, Philippe; Bonnot, Philippe; Candaele, Bernard; Maufroid, Daniel; Pastorelli, Elena; Ricciardi, Andrea; Fusella, Yves; Guarino, Eugenio

    2001-09-01

    mAgic-FPU is the architecture of a family of VLIW cores for configurable system level integration of floating and fixed point computing power. mAgic customization permits the designer to tune basic parameters, such as the computing power/memory access ratio of the core processor, the number of available arithmetic operation per cycle, the register file size and number of port, as well as of the number of arithmetic operators. The reconfiguration (e.g., of register file size and number of port, as well as of the number of arithmetic operators) is supported by the software environment MADE (Modular VLIW processor Architecture and Assembler Description Environment). MADE reads an architecture description file and produces a customized assembler-scheduler for the target VLIW architecture, configuring a general purpose VLIW optimizer-scheduler engine. The mAgic-FPU core architecture satisfies the requisite of portability among silicon foundries. The first members of the mAgic FPU core family architecture fit the requirements of 'Smart Antenna for Adaptive Beam-Forming processing' and 'Physical Sound Synthesis'. The first 1 GigaFlops mAgic core will run at 100 MHz within an area of 40 mm 2 in 0.25 μm ATMEL CMOS technology in first half 2002.

  6. SAD PROCESSOR FOR MULTIPLE MACROBLOCK MATCHING IN FAST SEARCH VIDEO MOTION ESTIMATION

    Directory of Open Access Journals (Sweden)

    Nehal N. Shah

    2015-02-01

    Full Text Available Motion estimation is a very important but computationally complex task in video coding. Process of determining motion vectors based on the temporal correlation of consecutive frame is used for video compression. In order to reduce the computational complexity of motion estimation and maintain the quality of encoding during motion compensation, different fast search techniques are available. These block based motion estimation algorithms use the sum of absolute difference (SAD between corresponding macroblock in current frame and all the candidate macroblocks in the reference frame to identify best match. Existing implementations can perform SAD between two blocks using sequential or pipeline approach but performing multi operand SAD in single clock cycle with optimized recourses is state of art. In this paper various parallel architectures for computation of the fixed block size SAD is evaluated and fast parallel SAD architecture is proposed with optimized resources. Further SAD processor is described with 9 processing elements which can be configured for any existing fast search block matching algorithm. Proposed SAD processor consumes 7% fewer adders compared to existing implementation for one processing elements. Using nine PE it can process 84 HD frames per second in worse case which is good outcome for real time implementation. In average case architecture process 325 HD frames per second.

  7. Fast algorithms for coordinate processors in Galois field for multiplicity t = 4.5 and t > 5

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1989-01-01

    Fast algorithms for solving the coordinate equations for special-purpose processors at multiplicity t = 4.5 and t > 5 are described. Block diagrams of coordinate processor for t 4 in Galois field GF(2 m ) is presented which is solved by a table method. Economical algorithms for solving the coordinate equations by serial methods at t > 5 are described. The algorithms and devices proposed could be applied when creating fast processors in high energy physics spectrometers. 9 refs.; 3 figs

  8. Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

    Science.gov (United States)

    Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

    1994-01-01

    An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.

  9. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  10. Architecture-Aware Optimization of an HEVC decoder on Asymmetric Multicore Processors

    OpenAIRE

    Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

    2016-01-01

    Low-power asymmetric multicore processors (AMPs) attract considerable attention due to their appealing performance-power ratio for energy-constrained environments. However, these processors pose a significant programming challenge due to the integration of cores with different performance capabilities, asking for an asymmetry-aware scheduling solution that carefully distributes the workload. The recent HEVC standard, which offers several high-level parallelization strategies, is an important ...

  11. Knowledge Framework Implementation with Multiple Architectures - 13090

    Energy Technology Data Exchange (ETDEWEB)

    Upadhyay, H.; Lagos, L.; Quintero, W.; Shoffner, P. [Applied Research Center, Florida International University, Miami, FL 33174 (United States); DeGregory, J. [Office of D and D and Facility Engineering, Environmental Management, Department of Energy (United States)

    2013-07-01

    Multiple kinds of knowledge management systems are operational in public and private enterprises, large and small organizations with a variety of business models that make the design, implementation and operation of integrated knowledge systems very difficult. In recent days, there has been a sweeping advancement in the information technology area, leading to the development of sophisticated frameworks and architectures. These platforms need to be used for the development of integrated knowledge management systems which provides a common platform for sharing knowledge across the enterprise, thereby reducing the operational inefficiencies and delivering cost savings. This paper discusses the knowledge framework and architecture that can be used for the system development and its application to real life need of nuclear industry. A case study of deactivation and decommissioning (D and D) is discussed with the Knowledge Management Information Tool platform and framework. D and D work is a high priority activity across the Department of Energy (DOE) complex. Subject matter specialists (SMS) associated with DOE sites, the Energy Facility Contractors Group (EFCOG) and the D and D community have gained extensive knowledge and experience over the years in the cleanup of the legacy waste from the Manhattan Project. To prevent the D and D knowledge and expertise from being lost over time from the evolving and aging workforce, DOE and the Applied Research Center (ARC) at Florida International University (FIU) proposed to capture and maintain this valuable information in a universally available and easily usable system. (authors)

  12. Knowledge Framework Implementation with Multiple Architectures - 13090

    International Nuclear Information System (INIS)

    Upadhyay, H.; Lagos, L.; Quintero, W.; Shoffner, P.; DeGregory, J.

    2013-01-01

    Multiple kinds of knowledge management systems are operational in public and private enterprises, large and small organizations with a variety of business models that make the design, implementation and operation of integrated knowledge systems very difficult. In recent days, there has been a sweeping advancement in the information technology area, leading to the development of sophisticated frameworks and architectures. These platforms need to be used for the development of integrated knowledge management systems which provides a common platform for sharing knowledge across the enterprise, thereby reducing the operational inefficiencies and delivering cost savings. This paper discusses the knowledge framework and architecture that can be used for the system development and its application to real life need of nuclear industry. A case study of deactivation and decommissioning (D and D) is discussed with the Knowledge Management Information Tool platform and framework. D and D work is a high priority activity across the Department of Energy (DOE) complex. Subject matter specialists (SMS) associated with DOE sites, the Energy Facility Contractors Group (EFCOG) and the D and D community have gained extensive knowledge and experience over the years in the cleanup of the legacy waste from the Manhattan Project. To prevent the D and D knowledge and expertise from being lost over time from the evolving and aging workforce, DOE and the Applied Research Center (ARC) at Florida International University (FIU) proposed to capture and maintain this valuable information in a universally available and easily usable system. (authors)

  13. A soft-core processor architecture optimised for radar signal processing applications

    CSIR Research Space (South Africa)

    Broich, R

    2013-12-01

    Full Text Available -performance soft-core processing architecture is proposed. To develop such a processing architecture, data and signal-flow characteristics of common radar signal processing algorithms are analysed. Each algorithm is broken down into signal processing...

  14. Design Methodology for Multiple Microcomputer Architectures.

    Science.gov (United States)

    1982-07-01

    multimicro design knowledge is true both in industry and in university environments. In the industrial environment, it reduces productivity and increases...Real-Time Processor Problems," Proc. of ELECTRO-81 Tercer Seminario de Ingenieria Electronica, Nov. 9-13, 1981. 14 1981 "D Flip/Flop Substracts

  15. The TMS34010 graphic processor - an architecture for image visualization in NMR tomography

    International Nuclear Information System (INIS)

    Slaets, Jan Frans Willem; Paiva, Maria Stela Veludo de; Almeida, Lirio O.B.

    1989-01-01

    This abstract presents a description of the minimum system implemented with the graphic processor TMS34010, which will be used in the reconstruction, treatment and interpretation f images obtained by NMR tomography. The project is being developed in the LIE (Electronic Instrumentation Laboratory), of the Sao Carlos Chemistry and Physical Institute, S P, Brazil and is already in operation

  16. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...

  17. Handling Multiple Ecologies in Architectural Design

    DEFF Research Database (Denmark)

    Lotz, Katrine; Sattrup, Peter Andreas

    2014-01-01

    In light of the many challenges of resource scarcity, climate change, rapid urbanization and changing social patterns facing societies today, main stream architecture remains remarkably 'resilient' to conceptual innovation regarding its nature and role in society. If the idea of open architecture...

  18. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    Science.gov (United States)

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  19. A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

    Energy Technology Data Exchange (ETDEWEB)

    Aliaga, José I., E-mail: aliaga@uji.es [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain); Alonso, Pedro [Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València (Spain); Badía, José M. [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain); Chacón, Pablo [Dept. Biological Chemical Physics, Rocasolano Physics and Chemistry Institute, CSIC, Madrid (Spain); Davidović, Davor [Rudjer Bošković Institute, Centar za Informatiku i Računarstvo – CIR, Zagreb (Croatia); López-Blanco, José R. [Dept. Biological Chemical Physics, Rocasolano Physics and Chemistry Institute, CSIC, Madrid (Spain); Quintana-Ortí, Enrique S. [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain)

    2016-03-15

    We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousands degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.

  20. A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

    International Nuclear Information System (INIS)

    Aliaga, José I.; Alonso, Pedro; Badía, José M.; Chacón, Pablo; Davidović, Davor; López-Blanco, José R.; Quintana-Ortí, Enrique S.

    2016-01-01

    We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousands degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.

  1. An Overview on SDN Architectures with Multiple Controllers

    Directory of Open Access Journals (Sweden)

    Othmane Blial

    2016-01-01

    Full Text Available Software-defined networking offers several benefits for networking by separating the control plane from the data plane. However, networks’ scalability, reliability, and availability remain as a big issue. Accordingly, multicontroller architectures are important for SDN-enabled networks. This paper gives a comprehensive overview of SDN multicontroller architectures. It presents SDN and its main instantiation OpenFlow. Then, it explains in detail the differences between multiple types of multicontroller architectures, like the distribution method and the communication system. Furthermore, it provides already implemented and under research examples of multicontroller architectures by describing their design, their communication process, and their performance results.

  2. Media processors using a new microsystem architecture designed for the Internet era

    Science.gov (United States)

    Wyland, David C.

    1999-12-01

    The demands of digital image processing, communications and multimedia applications are growing more rapidly than traditional design methods can fulfill them. Previously, only custom hardware designs could provide the performance required to meet the demands of these applications. However, hardware design has reached a crisis point. Hardware design can no longer deliver a product with the required performance and cost in a reasonable time for a reasonable risk. Software based designs running on conventional processors can deliver working designs in a reasonable time and with low risk but cannot meet the performance requirements. What is needed is a media processing approach that combines very high performance, a simple programming model, complete programmability, short time to market and scalability. The Universal Micro System (UMS) is a solution to these problems. The UMS is a completely programmable (including I/O) system on a chip that combines hardware performance with the fast time to market, low cost and low risk of software designs.

  3. System, methods and apparatus for program optimization for multi-threaded processor architectures

    Science.gov (United States)

    Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E

    2015-01-06

    Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

  4. Transportable GPU (General Processor Units) chip set technology for standard computer architectures

    Science.gov (United States)

    Fosdick, R. E.; Denison, H. C.

    1982-11-01

    The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.

  5. Architecture for Multiple Interacting Robot Intelligences

    Science.gov (United States)

    Peters, Richard Alan, II (Inventor)

    2008-01-01

    An architecture for robot intelligence enables a robot to learn new behaviors and create new behavior sequences autonomously and interact with a dynamically changing environment. Sensory information is mapped onto a Sensory Ego-Sphere (SES) that rapidly identifies important changes in the environment and functions much like short term memory. Behaviors are stored in a database associative memory (DBAM) that creates an active map from the robot's current state to a goal state and functions much like long term memory. A dream state converts recent activities stored in the SES and creates or modifies behaviors in the DBAM.

  6. Multiple Connections in RealXtend Architecture

    OpenAIRE

    Vatjus-Anttila, Jukka

    2012-01-01

    RealXtend is an open source virtual space platform implementing both client and server functionality. In the default implementation of realXtend, the client could only log in to one virtual space server at any given time. In this research an ability to make multiple simultaneous connections to virtual spaces was experimented. Focus of the research was on how to control multiple virtual spaces within the same client window from a technical point of view. This bachelor thesis presents metho...

  7. Trans-disciplinarity: The Singularities and Multiplicities of Architecture

    Directory of Open Access Journals (Sweden)

    Tahl Kaminer

    2007-10-01

    Full Text Available This inaugural issue of Footprint aims at understanding today’s architecture culture as a negotiation between two antithetical definitions of architecture’s identity. The belief in the disciplinary singularity of architectural objects, irreducible to the conditions of their production, is confronted - in discourse and design - with the perception of architecture as an interdisciplinary mediation between multiple political, economic, social, technological and cultural factors. With the concept of trans-disciplinarity, the negotiation between these two positions is investigated here as an engine of the ‘tradition of the present’ of contemporary architecture - the discourses and designs which emerged in the 1960s and defined orientation points for today’s architectural thought and practice.

  8. Trans-disciplinarity: The Singularities and Multiplicities of Architecture

    Directory of Open Access Journals (Sweden)

    Lukasz Stanek

    2014-07-01

    Full Text Available This inaugural issue of Footprint aims at understanding today’s architecture culture as a negotiation between two antithetical definitions of architecture’s identity. The belief in the disciplinary singularity of architectural objects, irreducible to the conditions of their production, is confronted – in discourse and design – with the perception of architecture as an interdisciplinary mediation between multiple political, economic, social, technological and cultural factors. With the concept of trans-disciplinarity, the negotiation between these two positions is investigated here as an engine of the ‘tradition of the present’ of contemporary architecture – the discourses and designs which emerged in the 1960s and defined orientation points for today’s architectural thought and practice.

  9. Reducing the computational requirements for simulating tunnel fires by combining multiscale modelling and multiple processor calculation

    DEFF Research Database (Denmark)

    Vermesi, Izabella; Rein, Guillermo; Colella, Francesco

    2017-01-01

    Multiscale modelling of tunnel fires that uses a coupled 3D (fire area) and 1D (the rest of the tunnel) model is seen as the solution to the numerical problem of the large domains associated with long tunnels. The present study demonstrates the feasibility of the implementation of this method...... in FDS version 6.0, a widely used fire-specific, open source CFD software. Furthermore, it compares the reduction in simulation time given by multiscale modelling with the one given by the use of multiple processor calculation. This was done using a 1200m long tunnel with a rectangular cross......-section as a demonstration case. The multiscale implementation consisted of placing a 30MW fire in the centre of a 400m long 3D domain, along with two 400m long 1D ducts on each side of it, that were again bounded by two nodes each. A fixed volume flow was defined in the upstream duct and the two models were coupled...

  10. Noise limitations in optical linear algebra processors.

    Science.gov (United States)

    Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

    1990-05-10

    A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

  11. Dual-scale topology optoelectronic processor.

    Science.gov (United States)

    Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

    1991-12-15

    The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.

  12. Parallel point-multiplication architecture using combined group operations for high-speed cryptographic applications.

    Directory of Open Access Journals (Sweden)

    Md Selim Hossain

    Full Text Available In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM, which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST. The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance ([Formula: see text] and Area × Time × Energy (ATE product of the proposed design are far better than the most significant studies found in the literature.

  13. C-HEAP : a heterogeneous multi-processor architecture template and scalable and flexible protocol for the design of embedded signal processing systems

    NARCIS (Netherlands)

    Nieuwland, A.K.; Kang, J.; Gangwal, O.P.; Sethuraman, R.; Busá, N.G.; Goossens, K.G.W.; Peset Llopis, R.; Lippens, P.E.R.

    2002-01-01

    The key issue in the design of Systems-on-a-Chip (SoC) is to trade-off efficiency against flexibility, and time to market versus cost. Current deep submicron processing technologies enable integration of multiple software programmable processors (e.g., CPUs, DSPs) and dedicated hardware components

  14. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

    DEFF Research Database (Denmark)

    Liu, Weifeng; Vinter, Brian

    2015-01-01

    of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over...

  15. The Molen Polymorphic Media Processor

    NARCIS (Netherlands)

    Kuzmanov, G.K.

    2004-01-01

    In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The

  16. Dual-core Itanium Processor

    CERN Multimedia

    2006-01-01

    Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.

  17. Accuracy Limitations in Optical Linear Algebra Processors

    Science.gov (United States)

    Batsell, Stephen Gordon

    1990-01-01

    One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.

  18. Multiple-Channel Security Architecture and its Implementation over SSL

    Directory of Open Access Journals (Sweden)

    Song Yong

    2006-01-01

    Full Text Available This paper presents multiple-channel SSL (MC-SSL, an architecture and protocol for protecting client-server communications. In contrast to SSL, which provides a single end-to-end secure channel, MC-SSL enables applications to employ multiple channels, each with its own cipher suite and data-flow direction. Our approach also allows for several partially trusted application proxies. The main advantages of MC-SSL over SSL are (a support for end-to-end security in the presence of partially trusted proxies, and (b selective data protection for achieving computational efficiency important to resource-constrained clients and heavily loaded servers.

  19. Hardware Synchronization for Embedded Multi-Core Processors

    DEFF Research Database (Denmark)

    Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

    2011-01-01

    Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...

  20. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  1. CASPER: Embedding Power Estimation and Hardware-Controlled Power Management in a Cycle-Accurate Micro-Architecture Simulation Platform for Many-Core Multi-Threading Heterogeneous Processors

    Directory of Open Access Journals (Sweden)

    Arun Ravindran

    2012-02-01

    Full Text Available Despite the promising performance improvement observed in emerging many-core architectures in high performance processors, high power consumption prohibitively affects their use and marketability in the low-energy sectors, such as embedded processors, network processors and application specific instruction processors (ASIPs. While most chip architects design power-efficient processors by finding an optimal power-performance balance in their design, some use sophisticated on-chip autonomous power management units, which dynamically reduce the voltage or frequencies of idle cores and hence extend battery life and reduce operating costs. For large scale designs of many-core processors, a holistic approach integrating both these techniques at different levels of abstraction can potentially achieve maximal power savings. In this paper we present CASPER, a robust instruction trace driven cycle-accurate many-core multi-threading micro-architecture simulation platform where we have incorporated power estimation models of a wide variety of tunable many-core micro-architectural design parameters, thus enabling processor architects to explore a sufficiently large design space and achieve power-efficient designs. Additionally CASPER is designed to accommodate cycle-accurate models of hardware controlled power management units, enabling architects to experiment with and evaluate different autonomous power-saving mechanisms to study the run-time power-performance trade-offs in embedded many-core processors. We have implemented two such techniques in CASPER–Chipwide Dynamic Voltage and Frequency Scaling, and Performance Aware Core-Specific Frequency Scaling, which show average power savings of 35.9% and 26.2% on a baseline 4-core SPARC based architecture respectively. This power saving data accounts for the power consumption of the power management units themselves. The CASPER simulation platform also provides users with complete support of SPARCV9

  2. High-speed packet filtering utilizing stream processors

    Science.gov (United States)

    Hummel, Richard J.; Fulp, Errin W.

    2009-04-01

    Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.

  3. A Methodolgy, Based on Analytical Modeling, for the Design of Parallel and Distributed Architectures for Relational Database Query Processors.

    Science.gov (United States)

    1987-12-01

    Application Programs Intelligent Disk Database Controller Manangement System Operating System Host .1’ I% Figure 2. Intelligent Disk Controller Application...8217. /- - • Database Control -% Manangement System Disk Data Controller Application Programs Operating Host I"" Figure 5. Processor-Per- Head data. Therefore, the...However. these ad- ditional properties have been proven in classical set and relation theory [75]. These additional properties are described here

  4. Design Principles for Synthesizable Processor Cores

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

    2012-01-01

    As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...

  5. Parallel computation for distributed parameter system-from vector processors to Adena computer

    Energy Technology Data Exchange (ETDEWEB)

    Nogi, T

    1983-04-01

    Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.

  6. MP CBM-Z V1.0: design for a new CBM-Z gas-phase chemical mechanism architecture for next generation processors

    OpenAIRE

    Wang, Hui; Lin, Junmin; Wu, Qizhong; Chen, Huansheng; Tang, Xiao; Wang, Zifa; Chen, Xueshun; Cheng, Huaqiong; Wang, Lanning

    2018-01-01

    Precise and rapid air quality simulation and forecasting are limited by the computation performance of the air quality model, and the gas-phase chemistry module is the most time-consuming function in the air quality model. In this study, we designed a new framework for the widely used Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to adapt the Single Instruction Multiple Data (SIMD) technology in the next-generation processors for improving its calculation performance. The...

  7. Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

    Science.gov (United States)

    Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

    2013-03-01

    With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.

  8. A Mobile Service Oriented Multiple Object Tracking Augmented Reality Architecture for Education and Learning Experiences

    Science.gov (United States)

    Rattanarungrot, Sasithorn; White, Martin; Newbury, Paul

    2014-01-01

    This paper describes the design of our service-oriented architecture to support mobile multiple object tracking augmented reality applications applied to education and learning scenarios. The architecture is composed of a mobile multiple object tracking augmented reality client, a web service framework, and dynamic content providers. Tracking of…

  9. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

    DEFF Research Database (Denmark)

    Liu, Weifeng; Vinter, Brian

    2015-01-01

    General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle...... extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data...... memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary...

  10. A 16-channel real-time digital processor for pulse-shape discrimination in multiplicity assay

    International Nuclear Information System (INIS)

    Joyce, Malcolm J.; Aspinall, M.D.; Cave, F.D.; Lavietes, A.

    2013-06-01

    In recent years, real-time neutron/γ-ray pulse-shape discrimination has become feasible for use with scintillator-based detectors that respond extremely quickly, on the order of 25 ns in terms of pulse width, and their application to a variety of nuclear material assays has been reported. For the in-situ analysis of nuclear materials, measurements are often based on the multiplicity assessment of spontaneous fission events. An example of this is the 240 Pu eff assessment stemming from long-established techniques developed for 3 He-based neutron coincidence counters when 3 He was abundant and cheap. However, such measurements when using scintillator detectors can be plagued by low detection efficiencies and low orders of coincidence (often limited to triples) if the number of detectors in use is similarly limited to 3-4 detectors. Conversely, an array of >10 detector modules arranged to optimize efficiency and multiplicity sensitivity, shifts the emphasis in terms of performance requirement to the real-time digital analyzer and, critically, to the scope remaining in the temporal processing window of these systems. In this paper we report on the design, development and commissioning of a bespoke, 16-channel real-time pulse-shape discrimination analyzer specified for the materials assay challenge summarized above. The analyzer incorporates 16 dedicated and independent high-voltage supplies along with 16 independent digital processing channels offering pulse-shape discrimination at a rate of 3 x 10 6 events per second. These functions are configured from a dedicated graphical user interface, and all settings can be adjusted on-the-fly with the analyzer effectively configured one-time-only (where desired) for subsequent plug-and-play connection, for example to a fuel bundle organic scintillation detector array. (authors)

  11. Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

    Science.gov (United States)

    Laracuente, Nicholas; Grossman, Carl

    2013-03-01

    We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College

  12. Green Building between Tradition and Modernity Study Comparative Analysis between Conventional Methods and Updated Styles of Design and Architecture Processors

    Directory of Open Access Journals (Sweden)

    H Elshimy

    2017-03-01

    Full Text Available Green house   concept appeared from the ancient to the modern age ages and there is a tendency to use a traditional architecture with a pristine ecological environment areas and through sophisticated systems arrived to modern systems of the upgraded systems by Treatment architectural achieve environmental   sustainability   in   recent   years,   sustainability concept has become the common interest of numerous disciplines. The reason for this popularity is to perform the sustainable development. The Concept of Green Architecture, also known as "sustainable architecture” or “green house,” is the theory, science and style of buildings designed and constructed in accordance   with environmentally   friendly   principles.   Green house strives to minimize the number of resources consumed in the   building's  construction,   use   and   operation,   as  well  as curtailing  the  harm  done  to  the  environment  through  the emission, pollution and waste of its components.To design, construct, operate and maintain buildings energy, water and new materials are utilized as well as amounts of waste causing negative effects to health and environment is generated. In order to limit these effects and design environmentally sound and resource efficient buildings; "green building systems" must be introduced, clarified, understood and practiced.This paper aims at highlighting these difficult and complex issues of sustainability which encompass the scope of almost every aspect of human life.

  13. Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, S.K.

    1987-01-01

    A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies

  14. Architecture of security management unit for safe hosting of multiple agents

    Science.gov (United States)

    Gilmont, Tanguy; Legat, Jean-Didier; Quisquater, Jean-Jacques

    1999-04-01

    In such growing areas as remote applications in large public networks, electronic commerce, digital signature, intellectual property and copyright protection, and even operating system extensibility, the hardware security level offered by existing processors is insufficient. They lack protection mechanisms that prevent the user from tampering critical data owned by those applications. Some devices make exception, but have not enough processing power nor enough memory to stand up to such applications (e.g. smart cards). This paper proposes an architecture of secure processor, in which the classical memory management unit is extended into a new security management unit. It allows ciphered code execution and ciphered data processing. An internal permanent memory can store cipher keys and critical data for several client agents simultaneously. The ordinary supervisor privilege scheme is replaced by a privilege inheritance mechanism that is more suited to operating system extensibility. The result is a secure processor that has hardware support for extensible multitask operating systems, and can be used for both general applications and critical applications needing strong protection. The security management unit and the internal permanent memory can be added to an existing CPU core without loss of performance, and do not require it to be modified.

  15. Multiple protein-domain conservation architecture as a non ...

    African Journals Online (AJOL)

    Using two-sets of surface viral glycoproteins of human immunodeficiency virus type I, HIV-1 (gp120) and Ebola virus, EBOV (gp1,2 preprotein) (selected because their CD-architecture has widely been studied, their sequences are available in public databases, and the same are well annotated), the MPDCAs among three ...

  16. Green Secure Processors: Towards Power-Efficient Secure Processor Design

    Science.gov (United States)

    Chhabra, Siddhartha; Solihin, Yan

    With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.

  17. Monte Carlo simulations on SIMD computer architectures

    International Nuclear Information System (INIS)

    Burmester, C.P.; Gronsky, R.; Wille, L.T.

    1992-01-01

    In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures

  18. Making CSB + -Trees Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....

  19. An architecture model for multiple disease management information systems.

    Science.gov (United States)

    Chen, Lichin; Yu, Hui-Chu; Li, Hao-Chun; Wang, Yi-Van; Chen, Huang-Jen; Wang, I-Ching; Wang, Chiou-Shiang; Peng, Hui-Yu; Hsu, Yu-Ling; Chen, Chi-Huang; Chuang, Lee-Ming; Lee, Hung-Chang; Chung, Yufang; Lai, Feipei

    2013-04-01

    Disease management is a program which attempts to overcome the fragmentation of healthcare system and improve the quality of care. Many studies have proven the effectiveness of disease management. However, the case managers were spending the majority of time in documentation, coordinating the members of the care team. They need a tool to support them with daily practice and optimizing the inefficient workflow. Several discussions have indicated that information technology plays an important role in the era of disease management. Whereas applications have been developed, it is inefficient to develop information system for each disease management program individually. The aim of this research is to support the work of disease management, reform the inefficient workflow, and propose an architecture model that enhance on the reusability and time saving of information system development. The proposed architecture model had been successfully implemented into two disease management information system, and the result was evaluated through reusability analysis, time consumed analysis, pre- and post-implement workflow analysis, and user questionnaire survey. The reusability of the proposed model was high, less than half of the time was consumed, and the workflow had been improved. The overall user aspect is positive. The supportiveness during daily workflow is high. The system empowers the case managers with better information and leads to better decision making.

  20. Parallel k-means++ for Multiple Shared-Memory Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Mackey, Patrick S.; Lewis, Robert R.

    2016-09-22

    In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varying data sizes.

  1. Multiple Estimation Architecture in Discrete-Time Adaptive Mixing Control

    Directory of Open Access Journals (Sweden)

    Simone Baldi

    2013-05-01

    Full Text Available Adaptive mixing control (AMC is a recently developed control scheme for uncertain plants, where the control action coming from a bank of precomputed controller is mixed based on the parameter estimates generated by an on-line parameter estimator. Even if the stability of the control scheme, also in the presence of modeling errors and disturbances, has been shown analytically, its transient performance might be sensitive to the initial conditions of the parameter estimator. In particular, for some initial conditions, transient oscillations may not be acceptable in practical applications. In order to account for such a possible phenomenon and to improve the learning capability of the adaptive scheme, in this paper a new mixing architecture is developed, involving the use of parallel parameter estimators, or multi-estimators, each one working on a small subset of the uncertainty set. A supervisory logic, using performance signals based on the past and present estimation error, selects the parameter estimate to determine the mixing of the controllers. The stability and robustness properties of the resulting approach, referred to as multi-estimator adaptive mixing control (Multi-AMC, are analytically established. Besides, extensive simulations demonstrate that the scheme improves the transient performance of the original AMC with a single estimator. The control scheme and the analysis are carried out in a discrete-time framework, for easier implementation of the method in digital control.

  2. Architecture

    OpenAIRE

    Clear, Nic

    2014-01-01

    When discussing science fiction’s relationship with architecture, the usual practice is to look at the architecture “in” science fiction—in particular, the architecture in SF films (see Kuhn 75-143) since the spaces of literary SF present obvious difficulties as they have to be imagined. In this essay, that relationship will be reversed: I will instead discuss science fiction “in” architecture, mapping out a number of architectural movements and projects that can be viewed explicitly as scien...

  3. Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture

    Science.gov (United States)

    Klarer, Paul R.; Hayward, David R.; Amai, Wendy A.

    2006-10-03

    This invention relates to system architectures, specifically failure-tolerant and self-reconfiguring embedded system architectures. The invention provides both a method and architecture for redundancy. There can be redundancy in both software and hardware for multiple levels of redundancy. The invention provides a self-reconfiguring architecture for activating redundant modules whenever other modules fail. The architecture comprises: a communication backbone connected to two or more processors and software modules running on each of the processors. Each software module runs on one processor and resides on one or more of the other processors to be available as a backup module in the event of failure. Each module and backup module reports its status over the communication backbone. If a primary module does not report, its backup module takes over its function. If the primary module becomes available again, the backup module returns to its backup status.

  4. Functional Verification of Enhanced RISC Processor

    OpenAIRE

    SHANKER NILANGI; SOWMYA L

    2013-01-01

    This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...

  5. Dynamic information architecture system (DIAS) : multiple model simulation management

    International Nuclear Information System (INIS)

    Simunich, K. L.; Sydelko, P.; Dolph, J.; Christiansen, J.

    2002-01-01

    Dynamic Information Architecture System (DIAS) is a flexible, extensible, object-based framework for developing and maintaining complex multidisciplinary simulations of a wide variety of application contexts. The modeling domain of a specific DIAS-based simulation is determined by (1) software Entity (domain-specific) objects that represent the real-world entities that comprise the problem space (atmosphere, watershed, human), and (2) simulation models and other data processing applications that express the dynamic behaviors of the domain entities. In DIAS, models communicate only with Entity objects, never with each other. Each Entity object has a number of Parameter and Aspect (of behavior) objects associated with it. The Parameter objects contain the state properties of the Entity object. The Aspect objects represent the behaviors of the Entity object and how it interacts with other objects. DIAS extends the ''Object'' paradigm by abstraction of the object's dynamic behaviors, separating the ''WHAT'' from the ''HOW.'' DIAS object class definitions contain an abstract description of the various aspects of the object's behavior (the WHAT), but no implementation details (the HOW). Separate DIAS models/applications carry the implementation of object behaviors (the HOW). Any model deemed appropriate, including existing legacy-type models written in other languages, can drive entity object behavior. The DIAS design promotes plug-and-play of alternative models, with minimal recoding of existing applications. The DIAS Context Builder object builds a constructs or scenario for the simulation, based on developer specification and user inputs. Because DIAS is a discrete event simulation system, there is a Simulation Manager object with which all events are processed. Any class that registers to receive events must implement an event handler (method) to process the event during execution. Event handlers can schedule other events; create or remove Entities from the

  6. Dynamic information architecture system (DIAS) : multiple model simulation management.

    Energy Technology Data Exchange (ETDEWEB)

    Simunich, K. L.; Sydelko, P.; Dolph, J.; Christiansen, J.

    2002-05-13

    Dynamic Information Architecture System (DIAS) is a flexible, extensible, object-based framework for developing and maintaining complex multidisciplinary simulations of a wide variety of application contexts. The modeling domain of a specific DIAS-based simulation is determined by (1) software Entity (domain-specific) objects that represent the real-world entities that comprise the problem space (atmosphere, watershed, human), and (2) simulation models and other data processing applications that express the dynamic behaviors of the domain entities. In DIAS, models communicate only with Entity objects, never with each other. Each Entity object has a number of Parameter and Aspect (of behavior) objects associated with it. The Parameter objects contain the state properties of the Entity object. The Aspect objects represent the behaviors of the Entity object and how it interacts with other objects. DIAS extends the ''Object'' paradigm by abstraction of the object's dynamic behaviors, separating the ''WHAT'' from the ''HOW.'' DIAS object class definitions contain an abstract description of the various aspects of the object's behavior (the WHAT), but no implementation details (the HOW). Separate DIAS models/applications carry the implementation of object behaviors (the HOW). Any model deemed appropriate, including existing legacy-type models written in other languages, can drive entity object behavior. The DIAS design promotes plug-and-play of alternative models, with minimal recoding of existing applications. The DIAS Context Builder object builds a constructs or scenario for the simulation, based on developer specification and user inputs. Because DIAS is a discrete event simulation system, there is a Simulation Manager object with which all events are processed. Any class that registers to receive events must implement an event handler (method) to process the event during execution. Event handlers

  7. High performance graphics processors for medical imaging applications

    International Nuclear Information System (INIS)

    Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

    1989-01-01

    This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display

  8. The TMS34010 graphic processor - an architecture for image visualization in NMR tomography; O processador grafico TMS34010 - uma arquitetura para visualizacao de imagem em tomografia por RMN

    Energy Technology Data Exchange (ETDEWEB)

    Slaets, Jan Frans Willem; Paiva, Maria Stela Veludo de; Almeida, Lirio O B

    1990-12-31

    This abstract presents a description of the minimum system implemented with the graphic processor TMS34010, which will be used in the reconstruction, treatment and interpretation f images obtained by NMR tomography. The project is being developed in the LIE (Electronic Instrumentation Laboratory), of the Sao Carlos Chemistry and Physical Institute, S P, Brazil and is already in operation 4 refs., 7 figs.

  9. The TMS34010 graphic processor - an architecture for image visualization in NMR tomography; O processador grafico TMS34010 - uma arquitetura para visualizacao de imagem em tomografia por RMN

    Energy Technology Data Exchange (ETDEWEB)

    Slaets, Jan Frans Willem; Paiva, Maria Stela Veludo de; Almeida, Lirio O.B

    1989-12-31

    This abstract presents a description of the minimum system implemented with the graphic processor TMS34010, which will be used in the reconstruction, treatment and interpretation f images obtained by NMR tomography. The project is being developed in the LIE (Electronic Instrumentation Laboratory), of the Sao Carlos Chemistry and Physical Institute, S P, Brazil and is already in operation 4 refs., 7 figs.

  10. Many - body simulations using an array processor

    International Nuclear Information System (INIS)

    Rapaport, D.C.

    1985-01-01

    Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate

  11. Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

    Energy Technology Data Exchange (ETDEWEB)

    Deveci, Mehmet [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Trott, Christian Robert [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2018-01-01

    Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.

  12. Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

    Energy Technology Data Exchange (ETDEWEB)

    Deveci, Mehmet [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Trott, Christian Robert [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-12-01

    Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.

  13. Hardware trigger processor for the MDT system

    CERN Document Server

    AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

    2017-01-01

    We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.

  14. Invasive tightly coupled processor arrays

    CERN Document Server

    LARI, VAHID

    2016-01-01

    This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...

  15. Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

    Science.gov (United States)

    Mack, Harold; Reddi, S. S.

    1980-04-01

    Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.

  16. Lipsi: Probably the Smallest Processor in the World

    DEFF Research Database (Denmark)

    Schoeberl, Martin

    2018-01-01

    While research on high-performance processors is important, it is also interesting to explore processor architectures at the other end of the spectrum: tiny processor cores for auxiliary functions. While it is common to implement small circuits for such functions, such as a serial port, in dedica...... at a minimal cost....

  17. Preliminary design of an advanced programmable digital filter network for large passive acoustic ASW systems. [Parallel processor

    Energy Technology Data Exchange (ETDEWEB)

    McWilliams, T.; Widdoes, Jr., L. C.; Wood, L.

    1976-09-30

    The design of an extremely high performance programmable digital filter of novel architecture, the LLL Programmable Digital Filter, is described. The digital filter is a high-performance multiprocessor having general purpose applicability and high programmability; it is extremely cost effective either in a uniprocessor or a multiprocessor configuration. The architecture and instruction set of the individual processor was optimized with regard to the multiple processor configuration. The optimal structure of a parallel processing system was determined for addressing the specific Navy application centering on the advanced digital filtering of passive acoustic ASW data of the type obtained from the SOSUS net. 148 figures. (RWR)

  18. A high-accuracy optical linear algebra processor for finite element applications

    Science.gov (United States)

    Casasent, D.; Taylor, B. K.

    1984-01-01

    Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.

  19. A lock circuit for a multi-core processor

    DEFF Research Database (Denmark)

    2015-01-01

    An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....

  20. Keystone Business Models for Network Security Processors

    Directory of Open Access Journals (Sweden)

    Arthur Low

    2013-07-01

    Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.

  1. Optical Array Processor: Laboratory Results

    Science.gov (United States)

    Casasent, David; Jackson, James; Vaerewyck, Gerard

    1987-01-01

    A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.

  2. Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Science.gov (United States)

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2013-11-05

    Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.

  3. A High Performance VLSI Computer Architecture For Computer Graphics

    Science.gov (United States)

    Chin, Chi-Yuan; Lin, Wen-Tai

    1988-10-01

    A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.

  4. Generating and executing programs for a floating point single instruction multiple data instruction set architecture

    Science.gov (United States)

    Gschwind, Michael K

    2013-04-16

    Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.

  5. XL-100S microprogrammable processor

    International Nuclear Information System (INIS)

    Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

    1983-01-01

    The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module

  6. A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems

    Directory of Open Access Journals (Sweden)

    Hiroki Iwaizumi

    2013-01-01

    Full Text Available A processor design for singular value decomposition (SVD and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs and arithmetic instructions specialized in complex matrix operations.

  7. Efficient Sorting on the Tilera Manycore Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Morari, Alessandro; Tumeo, Antonino; Villa, Oreste; Secchi, Simone; Valero, Mateo

    2012-10-24

    e present an efficient implementation of the radix sort algo- rithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is com- posed of 64 tiles interconnected through multiple fast Networks- on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor’s sustained performance. We discuss the overall throughput reached by our radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implemen- tations on x86 processors and graphic processing units.

  8. Fully distributed monitoring architecture supporting multiple trackees and trackers in indoor mobile asset management application.

    Science.gov (United States)

    Jeong, Seol Young; Jo, Hyeong Gon; Kang, Soon Ju

    2014-03-21

    A tracking service like asset management is essential in a dynamic hospital environment consisting of numerous mobile assets (e.g., wheelchairs or infusion pumps) that are continuously relocated throughout a hospital. The tracking service is accomplished based on the key technologies of an indoor location-based service (LBS), such as locating and monitoring multiple mobile targets inside a building in real time. An indoor LBS such as a tracking service entails numerous resource lookups being requested concurrently and frequently from several locations, as well as a network infrastructure requiring support for high scalability in indoor environments. A traditional centralized architecture needs to maintain a geographic map of the entire building or complex in its central server, which can cause low scalability and traffic congestion. This paper presents a self-organizing and fully distributed indoor mobile asset management (MAM) platform, and proposes an architecture for multiple trackees (such as mobile assets) and trackers based on the proposed distributed platform in real time. In order to verify the suggested platform, scalability performance according to increases in the number of concurrent lookups was evaluated in a real test bed. Tracking latency and traffic load ratio in the proposed tracking architecture was also evaluated.

  9. Median and Morphological Specialized Processors for a Real-Time Image Data Processing

    Directory of Open Access Journals (Sweden)

    Kazimierz Wiatr

    2002-01-01

    Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the author′s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.

  10. Nested dissection on a mesh-connected processor array

    International Nuclear Information System (INIS)

    Worley, P.H.; Schreiber, R.

    1986-01-01

    The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9

  11. Evaluation of existing and proposed computer architectures for future ground-based systems

    Science.gov (United States)

    Schulbach, C.

    1985-01-01

    Parallel processing architectures and techniques used in current supercomputers are described and projections are made of future advances. Presently, the von Neumann sequential processing pattern has been accelerated by having separate I/O processors, interleaved memories, wide memories, independent functional units and pipelining. Recent supercomputers have featured single-input, multiple data stream architectures, which have different processors for performing various operations (vector or pipeline processors). Multiple input, multiple data stream machines have also been developed. Data flow techniques, wherein program instructions are activated only when data are available, are expected to play a large role in future supercomputers, along with increased parallel processor arrays. The enhanced operational speeds are essential for adequately treating data from future spacecraft remote sensing instruments such as the Thematic Mapper.

  12. The LASS hardware processor

    International Nuclear Information System (INIS)

    Kunz, P.F.

    1976-01-01

    The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)

  13. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    Science.gov (United States)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  14. Programmable level-1 trigger with 3D-Flow processor array

    International Nuclear Information System (INIS)

    Crosetto, D.

    1994-01-01

    The 3D-Flow parallel processing system is a new concept in processor architecture, system architecture, and assembly architecture. Compared to the electronics used in present systems, this approach reduces the cost and complexity of the hardware and allows easy assembly, disassembly, incremental upgrading, and maintenance of different interconnection topologies. The 3D-Flow parallel-processing system benefits high energy physics (HEP) by allowing: (1) common less costly hardware to be used in different experiments. (2) new uses of existing installations. (3) tuning of trigger based on the first analyzed data, and (4) selection of desired events directly from raw data. The goal of this parallel-processing architecture is to acquire multiple data in parallel (up to 100 million frames per second) and to process them at high speed, accomplishing digital filtering on the input data, pattern recognition (particle identification), data moving, and data formatting. The main features of the system are its programmability, scalability, high-speed communication, and low cost. The compactness of the 3D-Flow parallel-processing system in concert with the processor architecture allows processor interconnections to be mapped into the geometry of sensors (detectors in HEP) without large interconnection signal delay, enabling real-time pattern recognition. The overall 3D-Flow project has passed a major design review at Fermilab (Reviewers included experts in computers, triggering, system assembly, and electronics)

  15. A low-cost, high-performance, digital signal processor-based lock-in amplifier capable of measuring multiple frequency sweeps simultaneously

    International Nuclear Information System (INIS)

    Sonnaillon, Maximiliano Osvaldo; Bonetto, Fabian Jose

    2005-01-01

    A high-performance digital lock-in amplifier implemented in a low-cost digital signal processor (DSP) board is described. This lock in is capable of measuring simultaneously multiple frequencies that change in time as frequency sweeps (chirps). The used 32-bit DSP has enough computing power to generate N=3 simultaneous reference signals and accurately measure the N=3 responses, operating as three lock ins connected in parallel to a linear system. The lock in stores the measured values in memory until they are downloaded to the a personal computer (PC). The lock in works in stand-alone mode and can be programmed and configured through the PC serial port. Downsampling and multiple filter stages were used in order to obtain a sharp roll off and a long time constant in the filters. This makes measurements possible in presence of high-noise levels. Before each measurement, the lock in performs an autocalibration that measures the frequency response of analog output and input circuitry in order to compensate for the departure from ideal operation. Improvements from previous lock-in implementations allow measuring the frequency response of a system in a short time. Furthermore, the proposed implementation can measure how the frequency response changes with time, a characteristic that is very important in our biotechnological application. The number of simultaneous components that the lock in can generate and measure can be extended, without reprogramming, by only using other DSPs of the same family that are code compatible and work at higher clock frequencies

  16. A low-cost, high-performance, digital signal processor-based lock-in amplifier capable of measuring multiple frequency sweeps simultaneously

    Energy Technology Data Exchange (ETDEWEB)

    Sonnaillon, Maximiliano Osvaldo; Bonetto, Fabian Jose [Laboratorio de Cavitacion y Biotecnologia, San Carlos de Bariloche (8400) (Argentina)

    2005-02-01

    A high-performance digital lock-in amplifier implemented in a low-cost digital signal processor (DSP) board is described. This lock in is capable of measuring simultaneously multiple frequencies that change in time as frequency sweeps (chirps). The used 32-bit DSP has enough computing power to generate N=3 simultaneous reference signals and accurately measure the N=3 responses, operating as three lock ins connected in parallel to a linear system. The lock in stores the measured values in memory until they are downloaded to the a personal computer (PC). The lock in works in stand-alone mode and can be programmed and configured through the PC serial port. Downsampling and multiple filter stages were used in order to obtain a sharp roll off and a long time constant in the filters. This makes measurements possible in presence of high-noise levels. Before each measurement, the lock in performs an autocalibration that measures the frequency response of analog output and input circuitry in order to compensate for the departure from ideal operation. Improvements from previous lock-in implementations allow measuring the frequency response of a system in a short time. Furthermore, the proposed implementation can measure how the frequency response changes with time, a characteristic that is very important in our biotechnological application. The number of simultaneous components that the lock in can generate and measure can be extended, without reprogramming, by only using other DSPs of the same family that are code compatible and work at higher clock frequencies.

  17. Utilizing a multiprocessor architecture - The performance of MIDAS

    International Nuclear Information System (INIS)

    Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

    1983-01-01

    The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements

  18. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Kuppangari Krishna RAO; Fazal NOORBASHA; Ram Asaray SINGH

    2010-01-01

    As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalab...

  19. Floating point only SIMD instruction set architecture including compare, select, Boolean, and alignment operations

    Science.gov (United States)

    Gschwind, Michael K [Chappaqua, NY

    2011-03-01

    Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.

  20. System and method for integrating and accessing multiple data sources within a data warehouse architecture

    Science.gov (United States)

    Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA

    2006-12-19

    A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.

  1. Probabilistic programmable quantum processors

    International Nuclear Information System (INIS)

    Buzek, V.; Ziman, M.; Hillery, M.

    2004-01-01

    We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  2. Huffman-based code compression techniques for embedded processors

    KAUST Repository

    Bonny, Mohamed Talal; Henkel, Jö rg

    2010-01-01

    % for ARM and MIPS, respectively. In our compression technique, we have conducted evaluations using a representative set of applications and we have applied each technique to two major embedded processor architectures, namely ARM and MIPS. © 2010 ACM.

  3. Adaptive Code Division Multiple Access Protocol for Wireless Network-on-Chip Architectures

    Science.gov (United States)

    Vijayakumaran, Vineeth

    Massive levels of integration following Moore's Law ushered in a paradigm shift in the way on-chip interconnections were designed. With higher and higher number of cores on the same die traditional bus based interconnections are no longer a scalable communication infrastructure. On-chip networks were proposed enabled a scalable plug-and-play mechanism for interconnecting hundreds of cores on the same chip. Wired interconnects between the cores in a traditional Network-on-Chip (NoC) system, becomes a bottleneck with increase in the number of cores thereby increasing the latency and energy to transmit signals over them. Hence, there has been many alternative emerging interconnect technologies proposed, namely, 3D, photonic and multi-band RF interconnects. Although they provide better connectivity, higher speed and higher bandwidth compared to wired interconnects; they also face challenges with heat dissipation and manufacturing difficulties. On-chip wireless interconnects is one other alternative proposed which doesn't need physical interconnection layout as data travels over the wireless medium. They are integrated into a hybrid NOC architecture consisting of both wired and wireless links, which provides higher bandwidth, lower latency, lesser area overhead and reduced energy dissipation in communication. However, as the bandwidth of the wireless channels is limited, an efficient media access control (MAC) scheme is required to enhance the utilization of the available bandwidth. This thesis proposes using a multiple access mechanism such as Code Division Multiple Access (CDMA) to enable multiple transmitter-receiver pairs to send data over the wireless channel simultaneously. It will be shown that such a hybrid wireless NoC with an efficient CDMA based MAC protocol can significantly increase the performance of the system while lowering the energy dissipation in data transfer. In this work it is shown that the wireless NoC with the proposed CDMA based MAC protocol

  4. A CNN-Specific Integrated Processor

    Directory of Open Access Journals (Sweden)

    Suleyman Malki

    2009-01-01

    Full Text Available Integrated Processors (IP are algorithm-specific cores that either by programming or by configuration can be re-used within many microelectronic systems. This paper looks at Cellular Neural Networks (CNN to become realized as IP. First current digital implementations are reviewed, and the memoryprocessor bandwidth issues are analyzed. Then a generic view is taken on the structure of the network, and a new intra-communication protocol based on rotating wheels is proposed. It is shown that this provides for guaranteed high-performance with a minimal network interface. The resulting node is small and supports multi-level CNN designs, giving the system a 30-fold increase in capacity compared to classical designs. As it facilitates multiple operations on a single image, and single operations on multiple images, with minimal access to the external image memory, balancing the internal and external data transfer requirements optimizes the system operation. In conventional digital CNN designs, the treatment of boundary nodes requires additional logic to handle the CNN value propagation scheme. In the new architecture, only a slight modification of the existing cells is necessary to model the boundary effect. A typical prototype for visual pattern recognition will house 4096 CNN cells with a 2% overhead for making it an IP.

  5. Learning hardware using multiple-valued logic - Part 2: Cube calculus and architecture

    NARCIS (Netherlands)

    Perkowski, M.A.; Foote, D.; Chen, Qihong; Al-Rabadi, A.; Jozwiak, L.

    2002-01-01

    For Part 1 see ibid. vol.22, no.3 (2002). A massively parallel reconfigurable processor speeds up the logic operators performed in the learning hardware. The approach uses combinatorial synthesis methods developed within the framework of the logic synthesis approach in digital-circuit-design

  6. Embedded Processor Laboratory

    Data.gov (United States)

    Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...

  7. Multithreading in vector processors

    Science.gov (United States)

    Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

    2018-01-16

    In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.

  8. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  9. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Science.gov (United States)

    Barr, David R. W.; Dudek, Piotr

    2009-12-01

    We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  10. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Directory of Open Access Journals (Sweden)

    David R. W. Barr

    2009-01-01

    Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  11. Multiprocessor architecture: Synthesis and evaluation

    Science.gov (United States)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  12. Unravelling Darwin's entangled bank: architecture and robustness of mutualistic networks with multiple interaction types.

    Science.gov (United States)

    Dáttilo, Wesley; Lara-Rodríguez, Nubia; Jordano, Pedro; Guimarães, Paulo R; Thompson, John N; Marquis, Robert J; Medeiros, Lucas P; Ortiz-Pulido, Raul; Marcos-García, Maria A; Rico-Gray, Victor

    2016-11-30

    Trying to unravel Darwin's entangled bank further, we describe the architecture of a network involving multiple forms of mutualism (pollination by animals, seed dispersal by birds and plant protection by ants) and evaluate whether this multi-network shows evidence of a structure that promotes robustness. We found that species differed strongly in their contributions to the organization of the multi-interaction network, and that only a few species contributed to the structuring of these patterns. Moreover, we observed that the multi-interaction networks did not enhance community robustness compared with each of the three independent mutualistic networks when analysed across a range of simulated scenarios of species extinction. By simulating the removal of highly interacting species, we observed that, overall, these species enhance network nestedness and robustness, but decrease modularity. We discuss how the organization of interlinked mutualistic networks may be essential for the maintenance of ecological communities, and therefore the long-term ecological and evolutionary dynamics of interactive, species-rich communities. We suggest that conserving these keystone mutualists and their interactions is crucial to the persistence of species-rich mutualistic assemblages, mainly because they support other species and shape the network organization. © 2016 The Author(s).

  13. Keystone Business Models for Network Security Processors

    OpenAIRE

    Arthur Low; Steven Muegge

    2013-01-01

    Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...

  14. Software-defined reconfigurable microwave photonics processor.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Capmany, José

    2015-06-01

    We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.

  15. Effect of processor temperature on film dosimetry

    International Nuclear Information System (INIS)

    Srivastava, Shiv P.; Das, Indra J.

    2012-01-01

    Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d max. , 10 × 10 cm 2 , 100 cm) to a given dose. An automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4–40.6°C (85–105°F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.

  16. Optical Associative Processors For Visual Perception"

    Science.gov (United States)

    Casasent, David; Telfer, Brian

    1988-05-01

    We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.

  17. Distributed Processor/Memory Architectures Design Program

    Science.gov (United States)

    1975-02-01

    233 2. ’omu ci; G d P’M Po . . . . ...l. . . .2.3.6..... 3. % atar ) GilaD1rP’M t . 234 4. (Otem (me1uo m...its assigned ID, short descriptor in English , size, production rate, producer, and all consumers. In addition, a communication link matrix describing

  18. Optimized GF(2k) ONB type I multiplier architecture based on the Massey-Omura multiplication pattern

    International Nuclear Information System (INIS)

    Fournaris, A P; Koufopavlou, O

    2005-01-01

    Multiplication in GF(2 k ) finite fields is becoming rapidly a very promising solution for fast, small, efficient binary algorithms designed for hardware applications. GF(2 k ) finite fields defined over optimal normal bases (ONB) can be very advantageous in term of gates number and multiplication time delay. Many ONB multipliers works have been proposed that use the Massey-Omura multiplication pattern. In this paper, a method for designing type I optimal normal basis multipliers and an optimal normal basis (ONB) type I multiplier hardware architecture is proposed that, through parallelism and pairing categorization of the ONB multiplication table matrix, achieves very interesting results in terms of gate number and multiplication time delay

  19. Matrix-vector multiplication using digital partitioning for more accurate optical computing

    Science.gov (United States)

    Gary, C. K.

    1992-01-01

    Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.

  20. Control structures for high speed processors

    Science.gov (United States)

    Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

    1982-01-01

    A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.

  1. Real time processor for array speckle interferometry

    Science.gov (United States)

    Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

    1989-02-01

    The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.

  2. Integrated fuel processor development

    International Nuclear Information System (INIS)

    Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.

    2001-01-01

    The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed

  3. Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs

    KAUST Repository

    Ltaief, Hatem; Gratadour, Damien; Charara, Ali; Gendron, Eric

    2016-01-01

    We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used

  4. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    ... to light gases then steam reform the light gases into hydrogen rich stream. This report documents the efforts in developing a fuel processor capable of providing hydrogen to a 3kW fuel cell stack...

  5. 3081/E processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.

    1984-04-01

    The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future

  6. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    The Air Base Technologies Division of the Air Force Research Laboratory has developed a logistic fuel processor that removes the sulfur content of the fuel and in the process converts logistic fuel...

  7. Adaptive signal processor

    Energy Technology Data Exchange (ETDEWEB)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.

  8. Adaptive signal processor

    International Nuclear Information System (INIS)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed

  9. Quantum chemistry on a superconducting quantum processor

    Energy Technology Data Exchange (ETDEWEB)

    Kaicher, Michael P.; Wilhelm, Frank K. [Theoretical Physics, Saarland University, 66123 Saarbruecken (Germany); Love, Peter J. [Department of Physics and Astronomy, Tufts University, Medford, MA 02155 (United States)

    2016-07-01

    Quantum chemistry is the most promising civilian application for quantum processors to date. We study its adaptation to superconducting (sc) quantum systems, computing the ground state energy of LiH through a variational hybrid quantum classical algorithm. We demonstrate how interactions native to sc qubits further reduce the amount of quantum resources needed, pushing sc architectures as a near-term candidate for simulations of more complex atoms/molecules.

  10. Does NASA's Constellation Architecture Offer Opportunities to Achieve Multiple Additional Goals in Space?

    Science.gov (United States)

    Thronson, Harley; Lester, Daniel F.

    2008-01-01

    Every major NASA human spaceflight program in the last four decades has been modified to achieve goals in space not incorporated within the original design goals: the Apollo Applications Program, Skylab, Space Shuttle, and International Space Station. Several groups in the US have been identifying major future science goals, the science facilities necessary to investigate them, as well as possible roles for augmented versions of elements of NASA's Constellation program. Specifically, teams in the astronomy community have been developing concepts for very capable missions to follow the James Webb Space Telescope that could take advantage of - or require - free-space operations by astronauts and/or robots. Taking as one example, the Single-Aperture Far-InfraRed (SAFIR) telescope with a approx. 10+ m aperture proposed for operation in the 2020 timeframe. According to current NASA plans, the Ares V launch vehicle (or a variant) will be available about the same time, as will the capability to transport astronauts to the vicinity of the Moon via the Orion Crew Exploration Vehicle and associated systems. [As the lunar surface offers no advantages - and major disadvantages - for most major optical systems, the expensive system for landing and operating on the lunar surface is not required.] Although as currently conceived, SAFIR and other astronomical missions will operate at the Sun-Earth L2 location, it appears trivial to travel for servicing to the more accessible Earth-Moon L1,2 locations. Moreover. as the recent Orbital Express and Automated Transfer Vehicle missions have demonstrated, future robotic capabilities should offer capabilities that would (remotely) extend human presence far beyond the vicinity of the Earth. In addition to multiplying the value of NASA's architecture for future human spaceflight to achieve the goals multiple major stakeholders. if humans one day travel beyond the Earth-Moon system - say, to Mars - technologies and capabilities for operating

  11. Functional unit for a processor

    NARCIS (Netherlands)

    Rohani, A.; Kerkhoff, Hans G.

    2013-01-01

    The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of

  12. Joint Segmentation of Multiple Thoracic Organs in CT Images with Two Collaborative Deep Architectures.

    Science.gov (United States)

    Trullo, Roger; Petitjean, Caroline; Nie, Dong; Shen, Dinggang; Ruan, Su

    2017-09-01

    Computed Tomography (CT) is the standard imaging technique for radiotherapy planning. The delineation of Organs at Risk (OAR) in thoracic CT images is a necessary step before radiotherapy, for preventing irradiation of healthy organs. However, due to low contrast, multi-organ segmentation is a challenge. In this paper, we focus on developing a novel framework for automatic delineation of OARs. Different from previous works in OAR segmentation where each organ is segmented separately, we propose two collaborative deep architectures to jointly segment all organs, including esophagus, heart, aorta and trachea. Since most of the organ borders are ill-defined, we believe spatial relationships must be taken into account to overcome the lack of contrast. The aim of combining two networks is to learn anatomical constraints with the first network, which will be used in the second network, when each OAR is segmented in turn. Specifically, we use the first deep architecture, a deep SharpMask architecture, for providing an effective combination of low-level representations with deep high-level features, and then take into account the spatial relationships between organs by the use of Conditional Random Fields (CRF). Next, the second deep architecture is employed to refine the segmentation of each organ by using the maps obtained on the first deep architecture to learn anatomical constraints for guiding and refining the segmentations. Experimental results show superior performance on 30 CT scans, comparing with other state-of-the-art methods.

  13. System Level Design of Reconfigurable Server Farms Using Elliptic Curve Cryptography Processor Engines

    Directory of Open Access Journals (Sweden)

    Sangook Moon

    2014-01-01

    Full Text Available As today’s hardware architecture becomes more and more complicated, it is getting harder to modify or improve the microarchitecture of a design in register transfer level (RTL. Consequently, traditional methods we have used to develop a design are not capable of coping with complex designs. In this paper, we suggest a way of designing complex digital logic circuits with a soft and advanced type of SystemVerilog at an electronic system level. We apply the concept of design-and-reuse with a high level of abstraction to implement elliptic curve crypto-processor server farms. With the concept of the superior level of abstraction to the RTL used with the traditional HDL design, we successfully achieved the soft implementation of the crypto-processor server farms as well as robust test bench code with trivial effort in the same simulation environment. Otherwise, it could have required error-prone Verilog simulations for the hardware IPs and other time-consuming jobs such as C/SystemC verification for the software, sacrificing more time and effort. In the design of the elliptic curve cryptography processor engine, we propose a 3X faster GF(2m serial multiplication architecture.

  14. SET: Session Layer-Assisted Efficient TCP Management Architecture for 6LoWPAN with Multiple Gateways

    Directory of Open Access Journals (Sweden)

    Akbar AliHammad

    2010-01-01

    Full Text Available 6LoWPAN (IPv6 based Low-Power Personal Area Network is a protocol specification that facilitates communication of IPv6 packets on top of IEEE 802.15.4 so that Internet and wireless sensor networks can be inter-connected. This interconnection is especially required in commercial and enterprise applications of sensor networks where reliable and timely data transfers such as multiple code updates are needed from Internet nodes to sensor nodes. For this type of inbound traffic which is mostly bulk, TCP as transport layer protocol is essential, resulting in end-to-end TCP session through a default gateway. In this scenario, a single gateway tends to become the bottleneck because of non-uniform connectivity to all the sensor nodes besides being vulnerable to buffer overflow. We propose SET; a management architecture for multiple split-TCP sessions across a number of serving gateways. SET implements striping and multiple TCP session management through a shim at session layer. Through analytical modeling and ns2 simulations, we show that our proposed architecture optimizes communication for ingress bulk data transfer while providing associated load balancing services. We conclude that multiple split-TCP sessions managed in parallel across a number of gateways result in reduced latency for bulk data transfer and provide robustness against gateway failures.

  15. Positron emission tomographic images and expectation maximization: A VLSI architecture for multiple iterations per second

    International Nuclear Information System (INIS)

    Jones, W.F.; Byars, L.G.; Casey, M.E.

    1988-01-01

    A digital electronic architecture for parallel processing of the expectation maximization (EM) algorithm for Positron Emission tomography (PET) image reconstruction is proposed. Rapid (0.2 second) EM iterations on high resolution (256 x 256) images are supported. Arrays of two very large scale integration (VLSI) chips perform forward and back projection calculations. A description of the architecture is given, including data flow and partitioning relevant to EM and parallel processing. EM images shown are produced with software simulating the proposed hardware reconstruction algorithm. Projected cost of the system is estimated to be small in comparison to the cost of current PET scanners

  16. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, M.; Charleton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Biddulph, P.; Eisenhandler, E.; Fensome, I.F.; Landon, M.; Robinson, D.; Oliver, J.; Sumorok, K.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no dead time. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (orig.)

  17. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, N.; Baird, S.A.; Biddulph, P.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no deadtime. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (author)

  18. 16-Bit RISC Processor Design for Convolution Application

    OpenAIRE

    Anand Nandakumar Shardul

    2013-01-01

    In this project, we propose a 16-bit non-pipelined RISC processor, which is used for signal processing applications. The processor consists of the blocks, namely, program counter, clock control unit, ALU, IDU and registers. Advantageous architectural modifications have been made in the incremented circuit used in program counter and carry select adder unit of the ALU in the RISC CPU core. Furthermore, a high speed and low power modified modifies multiplier has been designed and introduced in ...

  19. Description and Simulation of a Fast Packet Switch Architecture for Communication Satellites

    Science.gov (United States)

    Quintana, Jorge A.; Lizanich, Paul J.

    1995-01-01

    The NASA Lewis Research Center has been developing the architecture for a multichannel communications signal processing satellite (MCSPS) as part of a flexible, low-cost meshed-VSAT (very small aperture terminal) network. The MCSPS architecture is based on a multifrequency, time-division-multiple-access (MF-TDMA) uplink and a time-division multiplex (TDM) downlink. There are eight uplink MF-TDMA beams, and eight downlink TDM beams, with eight downlink dwells per beam. The information-switching processor, which decodes, stores, and transmits each packet of user data to the appropriate downlink dwell onboard the satellite, has been fully described by using VHSIC (Very High Speed Integrated-Circuit) Hardware Description Language (VHDL). This VHDL code, which was developed in-house to simulate the information switching processor, showed that the architecture is both feasible and viable. This paper describes a shared-memory-per-beam architecture, its VHDL implementation, and the simulation efforts.

  20. A Multi-Objective Compounded Local Mobile Cloud Architecture Using Priority Queues to Process Multiple Jobs.

    Science.gov (United States)

    Wei, Xiaohui; Sun, Bingyi; Cui, Jiaxu; Xu, Gaochao

    2016-01-01

    As a result of the greatly increased use of mobile devices, the disadvantages of portable devices have gradually begun to emerge. To solve these problems, the use of mobile cloud computing assisted by cloud data centers has been proposed. However, cloud data centers are always very far from the mobile requesters. In this paper, we propose an improved multi-objective local mobile cloud model: Compounded Local Mobile Cloud Architecture with Dynamic Priority Queues (LMCpri). This new architecture could briefly store jobs that arrive simultaneously at the cloudlet in different priority positions according to the result of auction processing, and then execute partitioning tasks on capable helpers. In the Scheduling Module, NSGA-II is employed as the scheduling algorithm to shorten processing time and decrease requester cost relative to PSO and sequential scheduling. The simulation results show that the number of iteration times that is defined to 30 is the best choice of the system. In addition, comparing with LMCque, LMCpri is able to effectively accommodate a requester who would like his job to be executed in advance and shorten execution time. Finally, we make a comparing experiment between LMCpri and cloud assisting architecture, and the results reveal that LMCpri presents a better performance advantage than cloud assisting architecture.

  1. A Multi-Objective Compounded Local Mobile Cloud Architecture Using Priority Queues to Process Multiple Jobs.

    Directory of Open Access Journals (Sweden)

    Xiaohui Wei

    Full Text Available As a result of the greatly increased use of mobile devices, the disadvantages of portable devices have gradually begun to emerge. To solve these problems, the use of mobile cloud computing assisted by cloud data centers has been proposed. However, cloud data centers are always very far from the mobile requesters. In this paper, we propose an improved multi-objective local mobile cloud model: Compounded Local Mobile Cloud Architecture with Dynamic Priority Queues (LMCpri. This new architecture could briefly store jobs that arrive simultaneously at the cloudlet in different priority positions according to the result of auction processing, and then execute partitioning tasks on capable helpers. In the Scheduling Module, NSGA-II is employed as the scheduling algorithm to shorten processing time and decrease requester cost relative to PSO and sequential scheduling. The simulation results show that the number of iteration times that is defined to 30 is the best choice of the system. In addition, comparing with LMCque, LMCpri is able to effectively accommodate a requester who would like his job to be executed in advance and shorten execution time. Finally, we make a comparing experiment between LMCpri and cloud assisting architecture, and the results reveal that LMCpri presents a better performance advantage than cloud assisting architecture.

  2. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Science.gov (United States)

    Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

    2018-02-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.

  3. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Directory of Open Access Journals (Sweden)

    Hristov Ivan

    2018-01-01

    Full Text Available We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP” in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL. The results show 2 times better performance on KNL processor.

  4. Token-Aware Completion Functions for Elastic Processor Verification

    Directory of Open Access Journals (Sweden)

    Sudarshan K. Srinivasan

    2009-01-01

    Full Text Available We develop a formal verification procedure to check that elastic pipelined processor designs correctly implement their instruction set architecture (ISA specifications. The notion of correctness we use is based on refinement. Refinement proofs are based on refinement maps, which—in the context of this problem—are functions that map elastic processor states to states of the ISA specification model. Data flow in elastic architectures is complicated by the insertion of any number of buffers in any place in the design, making it hard to construct refinement maps for elastic systems in a systematic manner. We introduce token-aware completion functions, which incorporate a mechanism to track the flow of data in elastic pipelines, as a highly automated and systematic approach to construct refinement maps. We demonstrate the efficiency of the overall verification procedure based on token-aware completion functions using six elastic pipelined processor models based on the DLX architecture.

  5. Recursive Matrix Inverse Update On An Optical Processor

    Science.gov (United States)

    Casasent, David P.; Baranoski, Edward J.

    1988-02-01

    A high accuracy optical linear algebraic processor (OLAP) using the digital multiplication by analog convolution (DMAC) algorithm is described for use in an efficient matrix inverse update algorithm with speed and accuracy advantages. The solution of the parameters in the algorithm are addressed and the advantages of optical over digital linear algebraic processors are advanced.

  6. 3081//sub E/ processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.; Trang, Q.; Fucci, A.; Jacobs, D.; Martin, B.; Storr, K.

    1983-03-01

    Since the introduction of the 168//sub E/, emulating processors have been successful over an amazingly wide range of applications. This paper will describe a second generation processor, the 3081//sub E/. This new processor, which is being developed as a collaboration between SLAC and CERN, goes beyond just fixing the obvious faults of the 168//sub E/. Not only will the 3081//sub E/ have much more memory space, incorporate many more IBM instructions, and have much more memory space, incorporate many more IBM instructions, and have full double precision floating point arithmetic, but it will also have faster execution times and be much simpler to build, debug, and maintain. The simple interface and reasonable cost of the 168//sub E/ will be maintained for the 3081//sub E/

  7. Accelerating molecular dynamic simulation on the cell processor and Playstation 3.

    Science.gov (United States)

    Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S

    2009-01-30

    Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.

  8. Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

    KAUST Repository

    Malas, Tareq Majed Yasin; Ahmadia, Aron; Brown, Jed; Gunnels, John A.; Keyes, David E.

    2012-01-01

    Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution

  9. The prefrontal landscape: implications of functional architecture for understanding human mentation and the central executive.

    Science.gov (United States)

    Goldman-Rakic, P S

    1996-10-29

    The functional architecture of prefrontal cortex is central to our understanding of human mentation and cognitive prowess. This region of the brain is often treated as an undifferentiated structure, on the one hand, or as a mosaic of psychological faculties, on the other. This paper focuses on the working memory processor as a specialization of prefrontal cortex and argues that the different areas within prefrontal cortex represent iterations of this function for different information domains, including spatial cognition, object cognition and additionally, in humans, semantic processing. According to this parallel processing architecture, the 'central executive' could be considered an emergent property of multiple domain-specific processors operating interactively. These processors are specializations of different prefrontal cortical areas, each interconnected both with the domain-relevant long-term storage sites in posterior regions of the cortex and with appropriate output pathways.

  10. Launching applications on compute and service processors running under different operating systems in scalable network of processor boards with routers

    Science.gov (United States)

    Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

    2009-03-17

    A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.

  11. A NEW OS ARCHITECTURE FOR IOT

    Directory of Open Access Journals (Sweden)

    Jean Y. Astier

    2018-03-01

    Full Text Available Current computer operating systems architectures are not well suited for the coming world of connected objects, known as the Internet of Things (IoT for multiple reasons: poor communication performances in both point-to-point and broadcast cases, poor operational reliability and network security, excessive requirements both in terms of processor power and memory size leading to excessive electrical power consumption. We introduce a new computer operating system architecture well adapted to IoT, from the most modest to the most complex, and more generally able to significantly raise the input/output capacities of any communicating computer. This architecture rests on the principles of the Von Neumann hardware model, and is composed of two types of asymmetric distributed containers, which communicate by message passing. We describe the sub-systems of both of these types of containers, where each sub-system has its own scheduler, and a dedicated execution level.

  12. VLSI Architectures for the Multiplication of Integers Modulo a Fermat Number

    Science.gov (United States)

    Chang, J. J.; Truong, T. K.; Reed, I. S.; Hsu, I. S.

    1984-01-01

    Multiplication is central in the implementation of Fermat number transforms and other residue number algorithms. There is need for a good multiplication algorithm that can be realized easily on a very large scale integration (VLSI) chip. The Leibowitz multiplier is modified to realize multiplication in the ring of integers modulo a Fermat number. This new algorithm requires only a sequence of cyclic shifts and additions. The designs developed for this new multiplier are regular, simple, expandable, and, therefore, suitable for VLSI implementation.

  13. Code compression for VLIW embedded processors

    Science.gov (United States)

    Piccinelli, Emiliano; Sannino, Roberto

    2004-04-01

    The implementation of processors for embedded systems implies various issues: main constraints are cost, power dissipation and die area. On the other side, new terminals perform functions that require more computational flexibility and effort. Long code streams must be loaded into memories, which are expensive and power consuming, to run on DSPs or CPUs. To overcome this issue, the "SlimCode" proprietary algorithm presented in this paper (patent pending technology) can reduce the dimensions of the program memory. It can run offline and work directly on the binary code the compiler generates, by compressing it and creating a new binary file, about 40% smaller than the original one, to be loaded into the program memory of the processor. The decompression unit will be a small ASIC, placed between the Memory Controller and the System bus of the processor, keeping unchanged the internal CPU architecture: this implies that the methodology is completely transparent to the core. We present comparisons versus the state-of-the-art IBM Codepack algorithm, along with its architectural implementation into the ST200 VLIW family core.

  14. The Central Trigger Processor (CTP)

    CERN Multimedia

    Franchini, Matteo

    2016-01-01

    The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.

  15. Self-Organizing Maps on the Cell Broadband Engine Architecture

    International Nuclear Information System (INIS)

    McConnell, Sabine M

    2010-01-01

    We present and evaluate novel parallel implementations of Self-Organizing Maps for the Cell Broadband Engine Architecture. Motivated by the interactive nature of the data-mining process, we evaluate the scalability of the implementations on two clusters using different network characteristics and incarnations (PS3 TM console and PowerXCell 8i) of the architecture. Our implementations use varying combinations of the Power Processing Elements (PPEs) and Synergistic Processing Elements (SPEs) found in the Cell architecture. For a single processor, our implementation scaled well with the number of SPEs regardless of the incarnation. When combining multiple PS3 TM consoles, the synchronization over the slower network resulted in poor speedups and demonstrated that the use of such a low-cost cluster may be severely restricted, even without the use of SPEs. When using multiple SPEs for the PowerXCell 8i cluster, the speedup grew linearly with increasing number of SPEs for a given number of processors, and linear up to a maximum with the number of processors for a given number of SPEs. Our implementation achieved a worst-case efficiency of 67% for the maximum number of processing elements involved in the computation, but consistently higher values for smaller numbers of processing elements with speedups of up to 70.

  16. Computer Architecture A Quantitative Approach

    CERN Document Server

    Hennessy, John L

    2007-01-01

    The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis

  17. Evolution of the Florida Launch Site Architecture: Embracing Multiple Customers, Enhancing Launch Opportunities

    Science.gov (United States)

    Colloredo, Scott; Gray, James A.

    2011-01-01

    The impending conclusion of the Space Shuttle Program and the Constellation Program cancellation unveiled in the FY2011 President's budget created a large void for human spaceflight capability and specifically launch activity from the Florida launch Site (FlS). This void created an opportunity to re-architect the launch site to be more accommodating to the future NASA heavy lift and commercial space industry. The goal is to evolve the heritage capabilities into a more affordable and flexible launch complex. This case study will discuss the FlS architecture evolution from the trade studies to select primary launch site locations for future customers, to improving infrastructure; promoting environmental remediation/compliance; improving offline processing, manufacturing, & recovery; developing range interface and control services with the US Air Force, and developing modernization efforts for the launch Pad, Vehicle Assembly Building, Mobile launcher, and supporting infrastructure. The architecture studies will steer how to best invest limited modernization funding from initiatives like the 21 st elSe and other potential funding.

  18. Genetic architecture of carbon isotope composition and growth in Eucalyptus across multiple environments.

    Science.gov (United States)

    Bartholomé, Jérôme; Mabiala, André; Savelli, Bruno; Bert, Didier; Brendel, Oliver; Plomion, Christophe; Gion, Jean-Marc

    2015-06-01

    In the context of climate change, the water-use efficiency (WUE) of highly productive tree varieties, such as eucalypts, has become a major issue for breeding programmes. This study set out to dissect the genetic architecture of carbon isotope composition (δ(13) C), a proxy of WUE, across several environments. A family of Eucalyptus urophylla × E. grandis was planted in three trials and phenotyped for δ(13) C and growth traits. High-resolution genetic maps enabled us to target genomic regions underlying δ(13) C quantitative trait loci (QTLs) on the E. grandis genome. Of the 15 QTLs identified for δ(13) C, nine were stable across the environments and three displayed significant QTL-by-environment interaction, suggesting medium to high genetic determinism for this trait. Only one colocalization was found between growth and δ(13) C. Gene ontology (GO) term enrichment analysis suggested candidate genes related to foliar δ(13) C, including two involved in the regulation of stomatal movements. This study provides the first report of the genetic architecture of δ(13) C and its relation to growth in Eucalyptus. The low correlations found between the two traits at phenotypic and genetic levels suggest the possibility of improving the WUE of Eucalyptus varieties without having an impact on breeding for growth. © 2015 CIRAD. New Phytologist © 2015 New Phytologist Trust.

  19. Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs

    KAUST Repository

    Ltaief, Hatem

    2016-06-02

    We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used as an operational testbed for simulating the de- sign of new instruments for the European Extremely Large Telescope project (E-ELT), the world\\'s biggest eye and one of Europe\\'s highest priorities in ground-based astronomy. The simulation corresponds to a multi-step multi-stage pro- cedure, which is fed, near real-time, by system and turbulence data coming from the telescope environment. Based on the PLASMA library powered by the OmpSs dynamic runtime system, our implementation relies on a task-based programming model to permit an asynchronous out-of-order execution. Using modern multicore architectures associated with the enormous computing power of GPUS, the resulting data-driven compute-intensive simulation of the entire MOAO application, composed of the tomographic reconstructor and the observing sequence, is capable of coping with the aforementioned real-time challenge and stands as a reference implementation for the computational astronomy community.

  20. Bulk-memory processor for data acquisition

    International Nuclear Information System (INIS)

    Nelson, R.O.; McMillan, D.E.; Sunier, J.W.; Meier, M.; Poore, R.V.

    1981-01-01

    To meet the diverse needs and data rate requirements at the Van de Graaff and Weapons Neutron Research (WNR) facilities, a bulk memory system has been implemented which includes a fast and flexible processor. This bulk memory processor (BMP) utilizes bit slice and microcode techniques and features a 24 bit wide internal architecture allowing direct addressing of up to 16 megawords of memory and histogramming up to 16 million counts per channel without overflow. The BMP is interfaced to the MOSTEK MK 8000 bulk memory system and to the standard MODCOMP computer I/O bus. Coding for the BMP both at the microcode level and with macro instructions is supported. The generalized data acquisition system has been extended to support the BMP in a manner transparent to the user

  1. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  2. Multiple Resource Host Architecture (MRHA) for the Mobile Detection Assessment Response System (MDARS) Revision A

    National Research Council Canada - National Science Library

    Everett, H

    2000-01-01

    The Mobile Detection Assessment and Response System (MDARS) program employs multiple robotic security platforms operating under the high level control of a remote host, with the direct supervision of a human operator...

  3. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    Energy Technology Data Exchange (ETDEWEB)

    Barhen, Jacob [ORNL; Kerekes, Ryan A [ORNL; ST Charles, Jesse Lee [ORNL; Buckner, Mark A [ORNL

    2008-01-01

    performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.

  4. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    International Nuclear Information System (INIS)

    Barhen, Jacob; Kerekes, Ryan A.; St Charles, Jesse Lee; Buckner, Mark A.

    2008-01-01

    performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R and D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.

  5. ARCHITECTURE AND DYNAMICS OF KEPLER'S CANDIDATE MULTIPLE TRANSITING PLANET SYSTEMS

    Energy Technology Data Exchange (ETDEWEB)

    Lissauer, Jack J.; Jenkins, Jon M.; Borucki, William J.; Bryson, Stephen T.; Howell, Steve B. [NASA Ames Research Center, Moffett Field, CA 94035 (United States); Ragozzine, Darin; Holman, Matthew J.; Carter, Joshua A. [Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138 (United States); Fabrycky, Daniel C.; Fortney, Jonathan J. [Department of Astronomy and Astrophysics, University of California, Santa Cruz, CA 95064 (United States); Steffen, Jason H. [Fermilab Center for Particle Astrophysics, Batavia, IL 60510 (United States); Ford, Eric B. [211 Bryant Space Science Center, University of Florida, Gainesville, FL 32611 (United States); Shporer, Avi [Las Cumbres Observatory Global Telescope Network, Santa Barbara, CA 93117 (United States); Rowe, Jason F.; Quintana, Elisa V.; Caldwell, Douglas A. [SETI Institute/NASA Ames Research Center, Moffett Field, CA 94035 (United States); Batalha, Natalie M. [Department of Physics and Astronomy, San Jose State University, San Jose, CA 95192 (United States); Ciardi, David [Exoplanet Science Institute/Caltech, Pasadena, CA 91125 (United States); Dunham, Edward W. [Lowell Observatory, Flagstaff, AZ 86001 (United States); Gautier, Thomas N. III, E-mail: Jack.Lissauer@nasa.gov [Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109 (United States); and others

    2011-11-01

    About one-third of the {approx}1200 transiting planet candidates detected in the first four months of Kepler data are members of multiple candidate systems. There are 115 target stars with two candidate transiting planets, 45 with three, 8 with four, and 1 each with five and six. We characterize the dynamical properties of these candidate multi-planet systems. The distribution of observed period ratios shows that the vast majority of candidate pairs are neither in nor near low-order mean-motion resonances. Nonetheless, there are small but statistically significant excesses of candidate pairs both in resonance and spaced slightly too far apart to be in resonance, particularly near the 2:1 resonance. We find that virtually all candidate systems are stable, as tested by numerical integrations that assume a nominal mass-radius relationship. Several considerations strongly suggest that the vast majority of these multi-candidate systems are true planetary systems. Using the observed multiplicity frequencies, we find that a single population of planetary systems that matches the higher multiplicities underpredicts the number of singly transiting systems. We provide constraints on the true multiplicity and mutual inclination distribution of the multi-candidate systems, revealing a population of systems with multiple super-Earth-size and Neptune-size planets with low to moderate mutual inclinations.

  6. Time-Predictable Computer Architecture

    Directory of Open Access Journals (Sweden)

    Schoeberl Martin

    2009-01-01

    Full Text Available Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET. Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.

  7. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2010-12-01

    Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

  8. Multimode power processor

    Science.gov (United States)

    O'Sullivan, George A.; O'Sullivan, Joseph A.

    1999-01-01

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

  9. Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Science.gov (United States)

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2014-02-11

    Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

  10. Eigensolution of finite element problems in a completely connected parallel architecture

    Science.gov (United States)

    Akl, Fred A.; Morel, Michael R.

    1989-01-01

    A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.

  11. Very wide register : an asymmetric register file organization for low power embedded processors.

    NARCIS (Netherlands)

    Raghavan, P.; Lambrechts, A.; Jayapala, M.; Catthoor, F.; Verkest, D.T.M.L.; Corporaal, H.

    2007-01-01

    In current embedded systems processors, multi-ported register files are one of the most power hungry parts of the processor, even when they are clustered. This paper presents a novel register file architecture, which has single ported cells and asymmetric interfaces to the memory and to the

  12. Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests

    Science.gov (United States)

    Casasent, D.; Jackson, J.

    1986-01-01

    A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.

  13. Multiple-image authentication with a cascaded multilevel architecture based on amplitude field random sampling and phase information multiplexing.

    Science.gov (United States)

    Fan, Desheng; Meng, Xiangfeng; Wang, Yurong; Yang, Xiulun; Pan, Xuemei; Peng, Xiang; He, Wenqi; Dong, Guoyan; Chen, Hongyi

    2015-04-10

    A multiple-image authentication method with a cascaded multilevel architecture in the Fresnel domain is proposed, in which a synthetic encoded complex amplitude is first fabricated, and its real amplitude component is generated by iterative amplitude encoding, random sampling, and space multiplexing for the low-level certification images, while the phase component of the synthetic encoded complex amplitude is constructed by iterative phase information encoding and multiplexing for the high-level certification images. Then the synthetic encoded complex amplitude is iteratively encoded into two phase-type ciphertexts located in two different transform planes. During high-level authentication, when the two phase-type ciphertexts and the high-level decryption key are presented to the system and then the Fresnel transform is carried out, a meaningful image with good quality and a high correlation coefficient with the original certification image can be recovered in the output plane. Similar to the procedure of high-level authentication, in the case of low-level authentication with the aid of a low-level decryption key, no significant or meaningful information is retrieved, but it can result in a remarkable peak output in the nonlinear correlation coefficient of the output image and the corresponding original certification image. Therefore, the method realizes different levels of accessibility to the original certification image for different authority levels with the same cascaded multilevel architecture.

  14. A Model of Distraction using new Architectural Mechanisms to Manage Multiple Goals

    NARCIS (Netherlands)

    Taatgen, Niels; Katidioti, Ioanna; Borst, Jelmer; van Vugt, Marieke; Taatgen, Niels; van Vugt, Marieke; Borst, Jelmer; Mehlhorn, Katja

    2015-01-01

    Cognitive models assume a one-to-one correspondence between task and goals. We argue that modeling a task by combining multiple goals has several advantages: a task can be constructed from components that are reused from other tasks, and it enables modeling thought processes that compete with or

  15. 3D-Flow processor for a programmable Level-1 trigger (feasibility study)

    International Nuclear Information System (INIS)

    Crosetto, D.

    1992-10-01

    A feasibility study has been made to use the 3D-Flow processor in a pipelined programmable parallel processing architecture to identify particles such as electrons, jets, muons, etc., in high-energy physics experiments

  16. Analysis of the computational requirements of a pulse-doppler radar signal processor

    CSIR Research Space (South Africa)

    Broich, R

    2012-05-01

    Full Text Available In an attempt to find an optimal processing architecture for radar signal processing applications, the different algorithms that are typically used in a pulse-Doppler radar signal processor are investigated. Radar algorithms are broken down...

  17. Onboard spectral imager data processor

    Science.gov (United States)

    Otten, Leonard J.; Meigs, Andrew D.; Franklin, Abraham J.; Sears, Robert D.; Robison, Mark W.; Rafert, J. Bruce; Fronterhouse, Donald C.; Grotbeck, Ronald L.

    1999-10-01

    Previous papers have described the concept behind the MightySat II.1 program, the satellite's Fourier Transform imaging spectrometer's optical design, the design for the spectral imaging payload, and its initial qualification testing. This paper discusses the on board data processing designed to reduce the amount of downloaded data by an order of magnitude and provide a demonstration of a smart spaceborne spectral imaging sensor. Two custom components, a spectral imager interface 6U VME card that moves data at over 30 MByte/sec, and four TI C-40 processors mounted to a second 6U VME and daughter card, are used to adapt the sensor to the spacecraft and provide the necessary high speed processing. A system architecture that offers both on board real time image processing and high-speed post data collection analysis of the spectral data has been developed. In addition to the on board processing of the raw data into a usable spectral data volume, one feature extraction technique has been incorporated. This algorithm operates on the basic interferometric data. The algorithm is integrated within the data compression process to search for uploadable feature descriptions.

  18. Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

    Science.gov (United States)

    2013-09-30

    1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the

  19. Transitions in Land Use Architecture under Multiple Human Driving Forces in a Semi-Arid Zone

    Directory of Open Access Journals (Sweden)

    Issa Ouedraogo

    2015-07-01

    Full Text Available The present study aimed to detect the main shifts in land-use architecture and assess the factors behind the changes in typical tropical semi-arid land in Burkina Faso. Three sets of time-series LANDSAT data over a 23-year period were used to detect land use changes and their underpinning drivers in multifunctional but vulnerable ecologies. Group discussions in selected villages were organized for mapping output interpretation and collection of essential drivers of change as perceived by local populations. Results revealed profound changes and transitions during the study period. During the last decade, shrub and wood savannahs exhibited high net changes (39% and −37% respectively with a weak net positive change for cropland (only 2%, while cropland and shrub savannah exhibited high swap (8% and 16%. This suggests that the area of cropland remained almost unchanged but was subject to relocation, wood savannah decreased drastically, and shrub savannah increased exponentially. Cropland exhibited a null net persistence while shrub and wood savannahs exhibited positive and negative net persistence (1.91 and −10.24, respectively, indicating that there is movement toward agricultural intensification and wood savannah tended to disappear to the benefit of shrub savannah. Local people are aware of the changes that have occurred and support the idea that illegal wood cutting and farming are inappropriate farming practices associated with immigration; absence of alternative cash generation sources, overgrazing and increasing demand for wood energy are driving the changes in their ecosystems. Policies that integrate restoration and conservation of natural ecosystems and promote sustainable agroforestry practices in the study zone are highly recommended.

  20. OpenCL code generation for low energy wide SIMD architectures with explicit datapath.

    NARCIS (Netherlands)

    She, D.; He, Y.; Waeijen, L.J.W.; Corporaal, H.; Jeschke, H.; Silvén, O.

    2013-01-01

    Energy efficiency is one of the most important aspects in designing embedded processors. The use of a wide SIMD processor architecture is a promising approach to build energy-efficient high performance embedded processors. In this paper, we propose a configurable wide SIMD architecture that utilizes

  1. Evaluation of the Intel Sandy Bridge-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2012-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...

  2. Video frame processor

    International Nuclear Information System (INIS)

    Joshi, V.M.; Agashe, Alok; Bairi, B.R.

    1993-01-01

    This report provides technical description regarding the Video Frame Processor (VFP) developed at Bhabha Atomic Research Centre. The instrument provides capture of video images available in CCIR format. Two memory planes each with a capacity of 512 x 512 x 8 bit data enable storage of two video image frames. The stored image can be processed on-line and on-line image subtraction can also be carried out for image comparisons. The VFP is a PC Add-on board and is I/O mapped within the host IBM PC/AT compatible computer. (author). 9 refs., 4 figs., 19 photographs

  3. Trigger and decision processors

    International Nuclear Information System (INIS)

    Franke, G.

    1980-11-01

    In recent years there have been many attempts in high energy physics to make trigger and decision processes faster and more sophisticated. This became necessary due to a permanent increase of the number of sensitive detector elements in wire chambers and calorimeters, and in fact it was possible because of the fast developments in integrated circuits technique. In this paper the present situation will be reviewed. The discussion will be mainly focussed upon event filtering by pure software methods and - rather hardware related - microprogrammable processors as well as random access memory triggers. (orig.)

  4. Optical Finite Element Processor

    Science.gov (United States)

    Casasent, David; Taylor, Bradley K.

    1986-01-01

    A new high-accuracy optical linear algebra processor (OLAP) with many advantageous features is described. It achieves floating point accuracy, handles bipolar data by sign-magnitude representation, performs LU decomposition using only one channel, easily partitions and considers data flow. A new application (finite element (FE) structural analysis) for OLAPs is introduced and the results of a case study presented. Error sources in encoded OLAPs are addressed for the first time. Their modeling and simulation are discussed and quantitative data are presented. Dominant error sources and the effects of composite error sources are analyzed.

  5. Globe hosts launch of new processor

    CERN Multimedia

    2006-01-01

    Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...

  6. Nash Bargaining Game-Theoretic Framework for Power Control in Distributed Multiple-Radar Architecture Underlying Wireless Communication System

    Directory of Open Access Journals (Sweden)

    Chenguang Shi

    2018-04-01

    Full Text Available This paper presents a novel Nash bargaining solution (NBS-based cooperative game-theoretic framework for power control in a distributed multiple-radar architecture underlying a wireless communication system. Our primary objective is to minimize the total power consumption of the distributed multiple-radar system (DMRS with the protection of wireless communication user’s transmission, while guaranteeing each radar’s target detection requirement. A unified cooperative game-theoretic framework is proposed for the optimization problem, where interference power constraints (IPCs are imposed to protect the communication user’s transmission, and a minimum signal-to-interference-plus-noise ratio (SINR requirement is employed to provide reliable target detection for each radar. The existence, uniqueness and fairness of the NBS to this cooperative game are proven. An iterative Nash bargaining power control algorithm with low computational complexity and fast convergence is developed and is shown to converge to a Pareto-optimal equilibrium for the cooperative game model. Numerical simulations and analyses are further presented to highlight the advantages and testify to the efficiency of our proposed cooperative game algorithm. It is demonstrated that the distributed algorithm is effective for power control and could protect the communication system with limited implementation overhead.

  7. Design of an Elliptic Curve Cryptography processor for RFID tag chips.

    Science.gov (United States)

    Liu, Zilong; Liu, Dongsheng; Zou, Xuecheng; Lin, Hui; Cheng, Jian

    2014-09-26

    Radio Frequency Identification (RFID) is an important technique for wireless sensor networks and the Internet of Things. Recently, considerable research has been performed in the combination of public key cryptography and RFID. In this paper, an efficient architecture of Elliptic Curve Cryptography (ECC) Processor for RFID tag chip is presented. We adopt a new inversion algorithm which requires fewer registers to store variables than the traditional schemes. A new method for coordinate swapping is proposed, which can reduce the complexity of the controller and shorten the time of iterative calculation effectively. A modified circular shift register architecture is presented in this paper, which is an effective way to reduce the area of register files. Clock gating and asynchronous counter are exploited to reduce the power consumption. The simulation and synthesis results show that the time needed for one elliptic curve scalar point multiplication over GF(2163) is 176.7 K clock cycles and the gate area is 13.8 K with UMC 0.13 μm Complementary Metal Oxide Semiconductor (CMOS) technology. Moreover, the low power and low cost consumption make the Elliptic Curve Cryptography Processor (ECP) a prospective candidate for application in the RFID tag chip.

  8. AMD's 64-bit Opteron processor

    CERN Multimedia

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  9. Implications of Multi-Core Architectures on the Development of Multiple Independent Levels of Security (MILS) Compliant Systems

    Science.gov (United States)

    2012-10-01

    Two CBEA based products available for purchase were: Sony PlayStation 3 and the IBM BladeCenter QS20. The CBEA provides a single general purpose...and second generation Sony PlayStation 3 devices. Figure 13 provides the block diagram for the CMOS 90 nm Cell Broadband Engine processor [IBM07a...information between two red networks via a black network. In this scenario, a Sony PlayStation 3 is used for the CBEA-compliant processor. The CBEA

  10. Design and development of microblaze processor based Remote Terminal Units for Fast Breeder Reactors

    International Nuclear Information System (INIS)

    Gour, Aditya; Santhanaraj, A.; Behera, R.P.; Murali, N.; Satyamurty, S.A.V.

    2013-01-01

    Remote Terminal Units (RTUs) are single board remote data acquisition and control systems that are widely used in FBRs during all states of plant operation. Distributed Digital Control System (DDCS) architecture is being followed for the plant control and operation, which mandates the need for multiple sockets support in TCPIP Ethernet communication in an embedded system. Existing RTUs are 89C51 microcontroller based systems where the TCPIP communication is done using Wiznet Module. These modules can support maximum of four sockets and are already obsolete from the market. In this paper a new RTU design is described where the complete digital logic of a board is implemented in one single FPGA device using Soft-core processor and EMAC controller with multiple socket support for the Ethernet communication. This makes design more reliable and immune to obsolescence. (author)

  11. The Heidelberg POLYP - a flexible and fault-tolerant poly-processor

    International Nuclear Information System (INIS)

    Maenner, R.; Deluigi, B.

    1981-01-01

    The Heidelberg poly-processor system POLYP is described. It is intended to be used in nuclear physics for reprocessing of experimental data, in high energy physics as second-stage trigger processor, and generally in other applications requiring high-computing power. The POLYP system consists of any number of I/O-processors, processor modules (eventually of different types), global memory segments, and a host processor. All modules (up to several hundred) are connected by a multiple common-data-bus system; all processors, additionally, by a multiple sync bus system for processor/task-scheduling. All hard- and software is designed to be decentralized and free of bottle-necks. Most hardware-faults like single-bit errors in memory or multi-bit errors during transfers are automatically corrected. Defective modules, buses, etc., can be removed with only a graceful degradation of the system-throughput. (orig.)

  12. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

    Science.gov (United States)

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

    2011-07-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  13. Coordinated Energy Management in Heterogeneous Processors

    Directory of Open Access Journals (Sweden)

    Indrani Paul

    2014-01-01

    Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.

  14. Multi-Softcore Architecture on FPGA

    Directory of Open Access Journals (Sweden)

    Mouna Baklouti

    2014-01-01

    Full Text Available To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication.

  15. Does supporting multiple student strategies lead to greater learning and motivation? Investigating a source of complexity in the architecture of intelligent tutoring systems

    NARCIS (Netherlands)

    Waalkens, Maaike; Aleven, Vincent; Taatgen, Niels

    Intelligent tutoring systems (ITS) support students in learning a complex problem-solving skill. One feature that makes an ITS architecturally complex, and hard to build, is support for strategy freedom, that is, the ability to let students pursue multiple solution strategies within a given problem.

  16. Reconfigurable signal processor designs for advanced digital array radar systems

    Science.gov (United States)

    Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

    2017-05-01

    The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.

  17. The architecture of gene regulatory variation across multiple human tissues: the MuTHER study.

    Directory of Open Access Journals (Sweden)

    Alexandra C Nica

    2011-02-01

    Full Text Available While there have been studies exploring regulatory variation in one or more tissues, the complexity of tissue-specificity in multiple primary tissues is not yet well understood. We explore in depth the role of cis-regulatory variation in three human tissues: lymphoblastoid cell lines (LCL, skin, and fat. The samples (156 LCL, 160 skin, 166 fat were derived simultaneously from a subset of well-phenotyped healthy female twins of the MuTHER resource. We discover an abundance of cis-eQTLs in each tissue similar to previous estimates (858 or 4.7% of genes. In addition, we apply factor analysis (FA to remove effects of latent variables, thus more than doubling the number of our discoveries (1,822 eQTL genes. The unique study design (Matched Co-Twin Analysis--MCTA permits immediate replication of eQTLs using co-twins (93%-98% and validation of the considerable gain in eQTL discovery after FA correction. We highlight the challenges of comparing eQTLs between tissues. After verifying previous significance threshold-based estimates of tissue-specificity, we show their limitations given their dependency on statistical power. We propose that continuous estimates of the proportion of tissue-shared signals and direct comparison of the magnitude of effect on the fold change in expression are essential properties that jointly provide a biologically realistic view of tissue-specificity. Under this framework we demonstrate that 30% of eQTLs are shared among the three tissues studied, while another 29% appear exclusively tissue-specific. However, even among the shared eQTLs, a substantial proportion (10%-20% have significant differences in the magnitude of fold change between genotypic classes across tissues. Our results underline the need to account for the complexity of eQTL tissue-specificity in an effort to assess consequences of such variants for complex traits.

  18. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    OpenAIRE

    Hristov Ivan; Goranov Goran; Hristova Radoslava

    2018-01-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP”) in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL). The results show 2 times better per...

  19. UNIBUS processor interface for a FASTBUS data acquisition system

    International Nuclear Information System (INIS)

    Larwill, M.; Lagerlund, T.D.; Barsotti, E.; Taff, L.M.; Franzen, J.

    1981-01-01

    Current work on a FASTBUS data acquisition system at Fermilab is described. The system will consist of three pieces of FASTBUS hardware: a UNIBUS processor interface (UPI), a dual-ported bulk memory, and a FASTBUS ''event builder'' (i.e., data acquisition processor). Primary efforts have been on specifying and constructing a UPI. The present specification includes capability for all basic FASTBUS operations, including list processing of consecutive FASTBUS operations. Some possible FASTBUS data acquisition system architectures employing the UPI are discussed along with some detailed specifications of the UPI itself

  20. Parallel Processor for 3D Recovery from Optical Flow

    Directory of Open Access Journals (Sweden)

    Jose Hugo Barron-Zambrano

    2009-01-01

    Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.

  1. Graphics processor efficiency for realization of rapid tabular computations

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

    2016-01-01

    Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions

  2. Scientific Computing Kernels on the Cell Processor

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine

    2007-04-04

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

  3. Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture

    Directory of Open Access Journals (Sweden)

    Yassin El-Hillali

    2009-01-01

    Full Text Available This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC architecture optimized for Multiple Target Tracking (MTT in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets within its field of view. As the number of targets increases, the computational needs of the MTT system also increase making it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints.

  4. Composable processor virtualization for embedded systems

    NARCIS (Netherlands)

    Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.

    2010-01-01

    Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization

  5. Parallel eigenanalysis of finite element models in a completely connected architecture

    Science.gov (United States)

    Akl, F. A.; Morel, M. R.

    1989-01-01

    A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.

  6. Modal Processor Effects Inspired by Hammond Tonewheel Organs

    Directory of Open Access Journals (Sweden)

    Kurt James Werner

    2016-06-01

    Full Text Available In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input.

  7. A single chip pulse processor for nuclear spectroscopy

    International Nuclear Information System (INIS)

    Hilsenrath, F.; Bakke, J.C.; Voss, H.D.

    1985-01-01

    A high performance digital pulse processor, integrated into a single gate array microcircuit, has been developed for spaceflight applications. The new approach takes advantage of the latest CMOS high speed A/D flash converters and low-power gated logic arrays. The pulse processor measures pulse height, pulse area and the required timing information (e.g. multi detector coincidence and pulse pile-up detection). The pulse processor features high throughput rate (e.g. 0.5 Mhz for 2 usec gausssian pulses) and improved differential linearity (e.g. + or - 0.2 LSB for a + or - 1 LSB A/D). Because of the parallel digital architecture of the device, the interface is microprocessor bus compatible. A satellite flight application of this module is presented for use in the X-ray imager and high energy particle spectrometers of the PEM experiment on the Upper Atmospheric Research Satellite

  8. Stepping motor control processor reference manual. Volume I

    International Nuclear Information System (INIS)

    Holloway, F.W.; VanArsdall, P.J.; Suski, G.J.; Gant, R.G.; Rash, M.

    1980-01-01

    This manual is intended to serve several purposes. The first goal is to describe the capabilities and operation of the SMC processor package from an operator or user point of view. Secondly, the manual will describe in some detail the basic hardware elements and how they can be used effectively to implement a step motor control system. Practical information on the use, installation and checkout of the hardware set is presented in the following sections along with programming suggestions. Available related system software is described in this manual for reference and as an aid in understanding the system architecture. Section two presents an overview and operations manual of the SMC processor describing its composition and functional capabilities. Section three contains hardware descriptions in some detail for the LLL-designed hardware used in the SMC processor. Basic theory of operation and important features are explained

  9. Microprocessor architectures RISC, CISC and DSP

    CERN Document Server

    Heath, Steve

    1995-01-01

    'Why are there all these different processor architectures and what do they all mean? Which processor will I use? How should I choose it?' Given the task of selecting an architecture or design approach, both engineers and managers require a knowledge of the whole system and an explanation of the design tradeoffs and their effects. This is information that rarely appears in data sheets or user manuals. This book fills that knowledge gap.Section 1 provides a primer and history of the three basic microprocessor architectures. Section 2 describes the ways in which the architectures react with the

  10. Level Zero Trigger Processor for the NA62 experiment

    Science.gov (United States)

    Soldi, D.; Chiozzi, S.

    2018-05-01

    The NA62 experiment is designed to measure the ultra-rare decay K+ arrow π+ ν bar nu branching ratio with a precision of ~ 10% at the CERN Super Proton Synchrotron (SPS). The trigger system of NA62 consists in three different levels designed to select events of physics interest in a high beam rate environment. The L0 Trigger Processor (L0TP) is the lowest level system of the trigger chain. It is hardware implemented using programmable logic. The architecture of the NA62 L0TP system is a new approach compared to existing systems used in high-energy physics experiments. It is fully digital, based on a standard gigabit Ethernet communication between detectors and the L0TP Board. The L0TP Board is a commercial development board, mounting a programmable logic device (FPGA). The primitives generated by sub-detectors are sent asynchronously using the UDP protocol to the L0TP during the entire beam spill period. The L0TP realigns in time the primitives coming from seven different sources and performs a data selection based on the characteristics of the event such as energy, multiplicity and topology of hits in the sub-detectors. It guarantees a maximum latency of 1 ms. The maximum input rate is about 10 MHz for each sub-detector, while the design maximum output trigger rate is 1 MHz. A description of the trigger algorithm is presented here.

  11. Data driven processor 'Vertex Trigger' for B experiments

    International Nuclear Information System (INIS)

    Hartouni, E.P.

    1993-01-01

    Data Driven Processors (DDP's) are specialized computation engines configured to solve specific numerical problems, such as vertex reconstruction. The architecture of the DDP which is the subject of this talk was designed and implemented by W. Sippach and B.C. Knapp at Nevis Lab. in the early 1980's. This particular implementation allows multiple parallel streams of data to provide input to a heterogenous collection of simple operators whose interconnection form an algorithm. The local data flow control allows this device to execute algorithms extremely quickly provided that care is taken in the layout of the algorithm. I/O rates of several hundred megabytes/second are routinely achieved thus making DDP's attractive candidates for complex online calculations. The original question was open-quote can a DDP reconstruct tracks in a Silicon Vertex Detector, find events with a separated vertex and do it fast enough to be used as an online trigger?close-quote Restating this inquiry as three questions and describing the answers to the questions will be the subject of this talk. The three specific questions are: (1) Can an algorithm be found which reconstructs tracks in a planar geometry and no magnetic field; (2) Can separated vertices be recognized in some way; (3) Can the algorithm be implemented in the Nevis-UMass and DDP and execute in 10-20 μs?

  12. Distributed processor systems

    International Nuclear Information System (INIS)

    Zacharov, B.

    1976-01-01

    In recent years, there has been a growing tendency in high-energy physics and in other fields to solve computational problems by distributing tasks among the resources of inter-coupled processing devices and associated system elements. This trend has gained further momentum more recently with the increased availability of low-cost processors and with the development of the means of data distribution. In two lectures, the broad question of distributed computing systems is examined and the historical development of such systems reviewed. An attempt is made to examine the reasons for the existence of these systems and to discern the main trends for the future. The components of distributed systems are discussed in some detail and particular emphasis is placed on the importance of standards and conventions in certain key system components. The ideas and principles of distributed systems are discussed in general terms, but these are illustrated by a number of concrete examples drawn from the context of the high-energy physics environment. (Auth.)

  13. High-level language computer architecture

    CERN Document Server

    Chu, Yaohan

    1975-01-01

    High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr

  14. Scientific programming on massively parallel processor CP-PACS

    International Nuclear Information System (INIS)

    Boku, Taisuke

    1998-01-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  15. Multibus-based parallel processor for simulation

    Science.gov (United States)

    Ogrady, E. P.; Wang, C.-H.

    1983-01-01

    A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.

  16. Simple and cost-effective fabrication of size-tunable zinc oxide architectures by multiple size reduction technique

    Directory of Open Access Journals (Sweden)

    Hyeong-Ho Park, Xin Zhang, Seon-Yong Hwang, Sang Hyun Jung, Semin Kang, Hyun-Beom Shin, Ho Kwan Kang, Hyung-Ho Park, Ross H Hill and Chul Ki Ko

    2012-01-01

    Full Text Available We present a simple size reduction technique for fabricating 400 nm zinc oxide (ZnO architectures using a silicon master containing only microscale architectures. In this approach, the overall fabrication, from the master to the molds and the final ZnO architectures, features cost-effective UV photolithography, instead of electron beam lithography or deep-UV photolithography. A photosensitive Zn-containing sol–gel precursor was used to imprint architectures by direct UV-assisted nanoimprint lithography (UV-NIL. The resulting Zn-containing architectures were then converted to ZnO architectures with reduced feature sizes by thermal annealing at 400 °C for 1 h. The imprinted and annealed ZnO architectures were also used as new masters for the size reduction technique. ZnO pillars of 400 nm diameter were obtained from a silicon master with pillars of 1000 nm diameter by simply repeating the size reduction technique. The photosensitivity and contrast of the Zn-containing precursor were measured as 6.5 J cm−2 and 16.5, respectively. Interesting complex ZnO patterns, with both microscale pillars and nanoscale holes, were demonstrated by the combination of dose-controlled UV exposure and a two-step UV-NIL.

  17. Simple and cost-effective fabrication of size-tunable zinc oxide architectures by multiple size reduction technique

    International Nuclear Information System (INIS)

    Park, Hyeong-Ho; Hwang, Seon-Yong; Jung, Sang Hyun; Kang, Semin; Shin, Hyun-Beom; Kang, Ho Kwan; Ko, Chul Ki; Zhang Xin; Hill, Ross H; Park, Hyung-Ho

    2012-01-01

    We present a simple size reduction technique for fabricating 400 nm zinc oxide (ZnO) architectures using a silicon master containing only microscale architectures. In this approach, the overall fabrication, from the master to the molds and the final ZnO architectures, features cost-effective UV photolithography, instead of electron beam lithography or deep-UV photolithography. A photosensitive Zn-containing sol–gel precursor was used to imprint architectures by direct UV-assisted nanoimprint lithography (UV-NIL). The resulting Zn-containing architectures were then converted to ZnO architectures with reduced feature sizes by thermal annealing at 400 °C for 1 h. The imprinted and annealed ZnO architectures were also used as new masters for the size reduction technique. ZnO pillars of 400 nm diameter were obtained from a silicon master with pillars of 1000 nm diameter by simply repeating the size reduction technique. The photosensitivity and contrast of the Zn-containing precursor were measured as 6.5 J cm −2 and 16.5, respectively. Interesting complex ZnO patterns, with both microscale pillars and nanoscale holes, were demonstrated by the combination of dose-controlled UV exposure and a two-step UV-NIL.

  18. The ATLAS Level-1 Calorimeter Trigger Architecture

    CERN Document Server

    Garvey, J; Mahout, G; Moye, T H; Staley, R J; Watkins, P M; Watson, A T; Achenbach, R; Hanke, P; Kluge, E E; Meier, K; Meshkov, P; Nix, O; Penno, K; Schmitt, K; Ay, Cc; Bauss, B; Dahlhoff, A; Jakobs, K; Mahboubi, K; Schäfer, U; Trefzger, T M; Eisenhandler, E F; Landon, M; Moyse, E; Thomas, J; Apostoglou, P; Barnett, B M; Brawn, I P; Davis, A O; Edwards, J; Gee, C N P; Gillman, A R; Perera, V J O; Qian, W; Bohm, C; Hellman, S; Hidvégi, A; Silverstein, S; RT 2003 13th IEEE-NPSS Real Time Conference

    2004-01-01

    The architecture of the ATLAS Level-1 Calorimeter Trigger system (L1Calo) is presented. Common approaches have been adopted for data distribution, result merging, readout, and slow control across the three different subsystems. A significant amount of common hardware is utilized, yielding substantial savings in cost, spares, and development effort. A custom, high-density backplane has been developed with data paths suitable for both the em/tt cluster processor (CP) and jet/energy-summation processor (JEP) subsystems. Common modules also provide interfaces to VME, CANbus and the LHC Timing, Trigger and Control system (TTC). A common data merger module (CMM) uses FPGAs with multiple configurations for summing electron/photon and tau/hadron cluster multiplicities, jet multiplicities, or total and missing transverse energy. The CMM performs both crate- and system-level merging. A common, FPGA-based readout driver (ROD) is used by all of the subsystems to send input, intermediate and output data to the data acquis...

  19. Processors and systems (picture processing)

    Energy Technology Data Exchange (ETDEWEB)

    Gemmar, P

    1983-01-01

    Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.

  20. Design of RISC Processor Using VHDL and Cadence

    Science.gov (United States)

    Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram

    The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.

  1. Seismometer array station processors

    International Nuclear Information System (INIS)

    Key, F.A.; Lea, T.G.; Douglas, A.

    1977-01-01

    A description is given of the design, construction and initial testing of two types of Seismometer Array Station Processor (SASP), one to work with data stored on magnetic tape in analogue form, the other with data in digital form. The purpose of a SASP is to detect the short period P waves recorded by a UK-type array of 20 seismometers and to edit these on to a a digital library tape or disc. The edited data are then processed to obtain a rough location for the source and to produce seismograms (after optimum processing) for analysis by a seismologist. SASPs are an important component in the scheme for monitoring underground explosions advocated by the UK in the Conference of the Committee on Disarmament. With digital input a SASP can operate at 30 times real time using a linear detection process and at 20 times real time using the log detector of Weichert. Although the log detector is slower, it has the advantage over the linear detector that signals with lower signal-to-noise ratio can be detected and spurious large amplitudes are less likely to produce a detection. It is recommended, therefore, that where possible array data should be recorded in digital form for input to a SASP and that the log detector of Weichert be used. Trial runs show that a SASP is capable of detecting signals down to signal-to-noise ratios of about two with very few false detections, and at mid-continental array sites it should be capable of detecting most, if not all, the signals with magnitude above msub(b) 4.5; the UK argues that, given a suitable network, it is realistic to hope that sources of this magnitude and above can be detected and identified by seismological means alone. (author)

  2. Processor tradeoffs in distributed real-time systems

    Science.gov (United States)

    Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

    1987-01-01

    The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.

  3. A UNIX-based prototype biomedical virtual image processor

    International Nuclear Information System (INIS)

    Fahy, J.B.; Kim, Y.

    1987-01-01

    The authors have developed a multiprocess virtual image processor for the IBM PC/AT, in order to maximize image processing software portability for biomedical applications. An interprocess communication scheme, based on two-way metacode exchange, has been developed and verified for this purpose. Application programs call a device-independent image processing library, which transfers commands over a shared data bridge to one or more Autonomous Virtual Image Processors (AVIP). Each AVIP runs as a separate process in the UNIX operating system, and implements the device-independent functions on the image processor to which it corresponds. Application programs can control multiple image processors at a time, change the image processor configuration used at any time, and are completely portable among image processors for which an AVIP has been implemented. Run-time speeds have been found to be acceptable for higher level functions, although rather slow for lower level functions, owing to the overhead associated with sending commands and data over the shared data bridge

  4. VLSI Architecture and Design

    OpenAIRE

    Johnsson, Lennart

    1980-01-01

    Integrated circuit technology is rapidly approaching a state where feature sizes of one micron or less are tractable. Chip sizes are increasing slowly. These two developments result in considerably increased complexity in chip design. The physical characteristics of integrated circuit technology are also changing. The cost of communication will be dominating making new architectures and algorithms both feasible and desirable. A large number of processors on a single chip will be possible....

  5. Applying the roofline performance model to the intel xeon phi knights landing processor

    OpenAIRE

    Doerfler, D; Deslippe, J; Williams, S; Oliker, L; Cook, B; Kurth, T; Lobet, M; Malas, T; Vay, JL; Vincenti, H

    2016-01-01

    � Springer International Publishing AG 2016. The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining t...

  6. Rational calculation accuracy in acousto-optical matrix-vector processor

    Science.gov (United States)

    Oparin, V. V.; Tigin, Dmitry V.

    1994-01-01

    The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.

  7. Precision analog signal processor for beam position measurements in electron storage rings

    International Nuclear Information System (INIS)

    Hinkson, J.A.; Unser, K.B.

    1995-05-01

    Beam position monitors (BPM) in electron and positron storage rings have evolved from simple systems composed of beam pickups, coaxial cables, multiplexing relays, and a single receiver (usually a analyzer) into very complex and costly systems of multiple receivers and processors. The older may have taken minutes to measure the circulating beam closed orbit. Today instrumentation designers are required to provide high-speed measurements of the beam orbit, often at the ring revolution frequency. In addition the instruments must have very high accuracy and resolution. A BPM has been developed for the Advanced Light Source (ALS) in Berkeley which features high resolution and relatively low cost. The instrument has a single purpose; to measure position of a stable stored beam. Because the pickup signals are multiplexed into a single receiver, and due to its narrow bandwidth, the receiver is not intended for single-turn studies. The receiver delivers normalized measurements of X and Y position entirely by analog means at nominally 1 V/mm. No computers are involved. No software is required. Bergoz, a French company specializing in precision beam instrumentation, integrated the ALS design m their new BPM analog signal processor module. Performance comparisons were made on the ALS. In this paper we report on the architecture and performance of the ALS prototype BPM

  8. Precision analog signal processor for beam position measurements in electron storage rings

    International Nuclear Information System (INIS)

    Hinkson, J.A.; Unser, K.B.

    1995-01-01

    Beam position monitors (BPM) in electron and positron storage rings have evolved from simple systems composed of beam pickups, coaxial cables, multiplexing relays, and a single receiver (usually a analyzer) into very complex and costly systems of multiple receivers and processors. The older may have taken minutes to measure the circulating beam closed orbit. Today instrumentation designers are required to provide high-speed measurements of the beam orbit, often at the ring revolution frequency. In addition the instruments must have very high accuracy and resolution. A BPM has been developed for the Advanced Light Source (ALS) in Berkeley which features high resolution and relatively low cost. The instrument has a single purpose; to measure position of a stable stored beam. Because the pickup signals are multiplexed into a single receiver, and due to its narrow bandwidth, the receiver is not intended for single-turn studies. The receiver delivers normalized measurements of X and Y posit ion entirely by analog means at nominally 1 V/mm. No computers are involved. No software is required. Bergoz, a French company specializing in precision beam instrumentation, integrated the ALS design m their new BPM analog signal processor module. Performance comparisons were made on the ALS. In this paper we report on the architecture and performance of the ALS prototype BPM

  9. 'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

    International Nuclear Information System (INIS)

    Vesztergombi, G.

    1989-01-01

    TRAX-I, a cost-effective parallel microcomputer, applying associative string processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this ''ionic'' representation are presented. (orig.)

  10. 'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

    International Nuclear Information System (INIS)

    Vestergombi, G.

    1989-11-01

    TRAX-I, a cost-effective parallel microcomputer, applying Associative String Processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this 'iconic' representation are presented. (orig.)

  11. Multichannel Baseband Processor for Wideband CDMA

    Science.gov (United States)

    Jalloul, Louay M. A.; Lin, Jim

    2005-12-01

    The system architecture of the cellular base station modem engine (CBME) is described. The CBME is a single-chip multichannel transceiver capable of processing and demodulating signals from multiple users simultaneously. It is optimized to process different classes of code-division multiple-access (CDMA) signals. The paper will show that through key functional system partitioning, tightly coupled small digital signal processing cores, and time-sliced reuse architecture, CBME is able to achieve a high degree of algorithmic flexibility while maintaining efficiency. The paper will also highlight the implementation and verification aspects of the CBME chip design. In this paper, wideband CDMA is used as an example to demonstrate the architecture concept.

  12. Multichannel Baseband Processor for Wideband CDMA

    Directory of Open Access Journals (Sweden)

    Jim Lin

    2005-07-01

    Full Text Available The system architecture of the cellular base station modem engine (CBME is described. The CBME is a single-chip multichannel transceiver capable of processing and demodulating signals from multiple users simultaneously. It is optimized to process different classes of code-division multiple-access (CDMA signals. The paper will show that through key functional system partitioning, tightly coupled small digital signal processing cores, and time-sliced reuse architecture, CBME is able to achieve a high degree of algorithmic flexibility while maintaining efficiency. The paper will also highlight the implementation and verification aspects of the CBME chip design. In this paper, wideband CDMA is used as an example to demonstrate the architecture concept.

  13. Slowdown in the $M/M/1$ discriminatory processor-sharing queue

    NARCIS (Netherlands)

    Cheung, S.K.; Kim, Bara; Kim, Jeongsim

    2008-01-01

    We consider a queue with multiple K job classes, Poisson arrivals, and exponentially distributed required service times in which a single processor serves according to the discriminatory processor-sharing (DPS) discipline. For this queue, we obtain the first and second moments of the slowdown, which

  14. Java Processor Optimized for RTSJ

    Directory of Open Access Journals (Sweden)

    Tu Shiliang

    2007-01-01

    Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.

  15. Satellite on-board real-time SAR processor prototype

    Science.gov (United States)

    Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

    2017-11-01

    A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and

  16. A portable modular architecture for robotic manipulator control

    International Nuclear Information System (INIS)

    Butler, P.L.

    1993-01-01

    A control architecture has been developed to provide a framework for robotic manipulator control. This architecture, called the Modular Integrated Control Architecture (MICA), has been successfully applied to two different manipulator systems. MICA is a portable system in two respects. First, it can be used for the control of different types of manipulator systems. Second, the MICA code is portable across several operating environments. This portability allows the sharing of common control code among various systems. A major portion of MICA is the precise control of multiple processors that have to be coordinated to control a manipulator system. By having NUCA control the processor synchronization, the system developer can concentrate on the specific aspects of a new manipulator system. MICA also provides standard functions for trajectory generation that can be used for most manipulators. Custom trajectory generators can be easily added to suit the needs of a particular robotic control system. Another facility that MICA provides is a simulation of the manipulator, allowing the control code to be simulated before trying it on a manipulator system. Using this technique, one can develop code for a manipulator system without risking damage to the arm during development

  17. Architectures for single-chip image computing

    Science.gov (United States)

    Gove, Robert J.

    1992-04-01

    This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.

  18. Discussion paper for a highly parallel array processor-based machine

    International Nuclear Information System (INIS)

    Hagstrom, R.; Bolotin, G.; Dawson, J.

    1984-01-01

    The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors

  19. Fast processor for dilepton triggers

    International Nuclear Information System (INIS)

    Katsanevas, S.; Kostarakis, P.; Baltrusaitis, R.

    1983-01-01

    We describe a fast trigger processor, developed for and used in Fermilab experiment E-537, for selecting high-mass dimuon events produced by negative pions and anti-protons. The processor finds candidate tracks by matching hit information received from drift chambers and scintillation counters, and determines their momenta. Invariant masses are calculated for all possible pairs of tracks and an event is accepted if any invariant mass is greater than some preselectable minimum mass. The whole process, accomplished within 5 to 10 microseconds, achieves up to a ten-fold reduction in trigger rate

  20. Novel WLL Architecture Based on Color Pixel Multiple Access Implemented on a Terrestrial Video Network as the Overlay

    DEFF Research Database (Denmark)

    Sanyal, Rajarshi; Cianca, Ernestina; Prasad, Ramjee

    2013-01-01

    Wireless Local Loop deployments are based on the traditional cellular technologies. However there are limitations in terms of intricacy, cost and time to deploy .In this paper, the authors introduce a Wireless Local Loop architecture employing the proposed CPMA technique on existing overlay video...

  1. List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

    Science.gov (United States)

    Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

    2014-03-01

    List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging

  2. VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-Based Communication Systems

    Directory of Open Access Journals (Sweden)

    Jen-Chih Kuo

    2003-12-01

    Full Text Available The technique of {orthogonal frequency division multiplexing (OFDM} is famous for its robustness against frequency-selective fading channel. This technique has been widely used in many wired and wireless communication systems. In general, the {fast Fourier transform (FFT} and {inverse FFT (IFFT} operations are used as the modulation/demodulation kernel in the OFDM systems, and the sizes of FFT/IFFT operations are varied in different applications of OFDM systems. In this paper, we design and implement a variable-length prototype FFT/IFFT processor to cover different specifications of OFDM applications. The cached-memory FFT architecture is our suggested VLSI system architecture to design the prototype FFT/IFFT processor for the consideration of low-power consumption. We also implement the twiddle factor butterfly {processing element (PE} based on the {{coordinate} rotation digital computer (CORDIC} algorithm, which avoids the use of conventional multiplication-and-accumulation unit, but evaluates the trigonometric functions using only add-and-shift operations. Finally, we implement a variable-length prototype FFT/IFFT processor with TSMC 0.35 μm 1P4M CMOS technology. The simulations results show that the chip can perform (64-2048-point FFT/IFFT operations up to 80 MHz operating frequency which can meet the speed requirement of most OFDM standards such as WLAN, ADSL, VDSL (256∼2K, DAB, and 2K-mode DVB.

  3. MZDASoft: a software architecture that enables large-scale comparison of protein expression levels over multiple samples based on liquid chromatography/tandem mass spectrometry.

    Science.gov (United States)

    Ghanat Bari, Mehrab; Ramirez, Nelson; Wang, Zhiwei; Zhang, Jianqiu Michelle

    2015-10-15

    Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously. We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft. The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases. MZDASoft

  4. Wavelength-encoded OCDMA system using opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  5. Wavelength-encoded OCDMA system using opto-VLSI processors

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  6. The hardware track finder processor in CMS at CERN

    CERN Document Server

    Kluge, A

    1997-01-01

    The work covers the design of the Track Finder Processor in the high energy experiment CMS (Compact Muon Solenoid, planned for 2005) at CERN/Geneva. The task of this processor is to identify muons and measure their transverse momentum. The track finder processor makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data an alysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and measure their transverse momentum. Each 25 ns a new data set is generated. Measurem ent of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC (Application Specific Integrated Circuit). A pipeline architecture processes new data sets with th e required data rate of 40 MHz to ensure dead time free operation. In the framework of this study specifications and the overall concept of the track finder processor were worked out in detail. Simul ations were performed...

  7. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Explicitly Parallel Instruction Computing (EPIC) is an instruction processing paradigm that has been in the spot- light due to its adoption by the next generation of Intel. Processors starting with the IA-64. The EPIC processing paradigm is an evolution of the Very Long Instruction. Word (VLIW) paradigm. This article gives an ...

  8. Exploring multiple feature combination strategies with a recurrent neural network architecture for off-line handwriting recognition

    Science.gov (United States)

    Mioulet, L.; Bideault, G.; Chatelain, C.; Paquet, T.; Brunessaux, S.

    2015-01-01

    The BLSTM-CTC is a novel recurrent neural network architecture that has outperformed previous state of the art algorithms in tasks such as speech recognition or handwriting recognition. It has the ability to process long term dependencies in temporal signals in order to label unsegmented data. This paper describes different ways of combining features using a BLSTM-CTC architecture. Not only do we explore the low level combination (feature space combination) but we also explore high level combination (decoding combination) and mid-level (internal system representation combination). The results are compared on the RIMES word database. Our results show that the low level combination works best, thanks to the powerful data modeling of the LSTM neurons.

  9. VON WISPR Family Processors: Volume 1

    National Research Council Canada - National Science Library

    Wagstaff, Ronald

    1997-01-01

    ...) and the background noise they are embedded in. Processors utilizing those fluctuations such as the von WISPR Family Processors discussed herein, are methods or algorithms that preferentially attenuate the fluctuating signals and noise...

  10. Deterministic chaos in the processor load

    International Nuclear Information System (INIS)

    Halbiniak, Zbigniew; Jozwiak, Ireneusz J.

    2007-01-01

    In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case

  11. Large computer systems and new architectures

    International Nuclear Information System (INIS)

    Bloch, T.

    1978-01-01

    The super-computers of today are becoming quite specialized and one can no longer expect to get all the state-of-the-art software and hardware facilities in one package. In order to achieve faster and faster computing it is necessary to experiment with new architectures, and the cost of developing each experimental architecture into a general-purpose computer system is too high when one considers the relatively small market for these computers. The result is that such computers are becoming 'back-ends' either to special systems (BSP, DAP) or to anything (CRAY-1). Architecturally the CRAY-1 is the most attractive today since it guarantees a speed gain of a factor of two over a CDC 7600 thus allowing us to regard any speed up resulting from vectorization as a bonus. It looks, however, as if it will be very difficult to make substantially faster computers using only pipe-lining techniques and that it will be necessary to explore multiple processors working on the same problem. The experience which will be gained with the BSP and the DAP over the next few years will certainly be most valuable in this respect. (Auth.)

  12. Digital design and computer architecture

    CERN Document Server

    Harris, David

    2010-01-01

    Digital Design and Computer Architecture is designed for courses that combine digital logic design with computer organization/architecture or that teach these subjects as a two-course sequence. Digital Design and Computer Architecture begins with a modern approach by rigorously covering the fundamentals of digital logic design and then introducing Hardware Description Languages (HDLs). Featuring examples of the two most widely-used HDLs, VHDL and Verilog, the first half of the text prepares the reader for what follows in the second: the design of a MIPS Processor. By the end of D

  13. In-Network Adaptation of Video Streams Using Network Processors

    Directory of Open Access Journals (Sweden)

    Mohammad Shorfuzzaman

    2009-01-01

    problem can be addressed, near the network edge, by applying dynamic, in-network adaptation (e.g., transcoding of video streams to meet available connection bandwidth, machine characteristics, and client preferences. In this paper, we extrapolate from earlier work of Shorfuzzaman et al. 2006 in which we implemented and assessed an MPEG-1 transcoding system on the Intel IXP1200 network processor to consider the feasibility of in-network transcoding for other video formats and network processor architectures. The use of “on-the-fly” video adaptation near the edge of the network offers the promise of simpler support for a wide range of end devices with different display, and so forth, characteristics that can be used in different types of environments.

  14. JIST: Just-In-Time Scheduling Translation for Parallel Processors

    Directory of Open Access Journals (Sweden)

    Giovanni Agosta

    2005-01-01

    Full Text Available The application fields of bytecode virtual machines and VLIW processors overlap in the area of embedded and mobile systems, where the two technologies offer different benefits, namely high code portability, low power consumption and reduced hardware cost. Dynamic compilation makes it possible to bridge the gap between the two technologies, but special attention must be paid to software instruction scheduling, a must for the VLIW architectures. We have implemented JIST, a Virtual Machine and JIT compiler for Java Bytecode targeted to a VLIW processor. We show the impact of various optimizations on the performance of code compiled with JIST through the experimental study on a set of benchmark programs. We report significant speedups, and increments in the number of instructions issued per cycle up to 50% with respect to the non-scheduling version of the JITcompiler. Further optimizations are discussed.

  15. A Time-Composable Operating System for the Patmos Processor

    DEFF Research Database (Denmark)

    Ziccardi, Marco; Schoeberl, Martin; Vardanega, Tullio

    2015-01-01

    -composable operating system, on top of a time-composable processor, facilitates incremental development, which is highly desirable for industry. This paper makes a twofold contribution. First, we present enhancements to the Patmos processor to allow achieving time composability at the operating system level. Second......, we extend an existing time-composable operating system, TiCOS, to make best use of advanced Patmos hardware features in the pursuit of time composability.......In the last couple of decades we have witnessed a steady growth in the complexity and widespread of real-time systems. In order to master the rising complexity in the timing behaviour of those systems, rightful attention has been given to the development of time-predictable computer architectures...

  16. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  17. Initial explorations of ARM processors for scientific computing

    International Nuclear Information System (INIS)

    Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Muzaffar, Shahzad

    2014-01-01

    Power efficiency is becoming an ever more important metric for both high performance and high throughput computing. Over the course of next decade it is expected that flops/watt will be a major driver for the evolution of computer architecture. Servers with large numbers of ARM processors, already ubiquitous in mobile computing, are a promising alternative to traditional x86-64 computing. We present the results of our initial investigations into the use of ARM processors for scientific computing applications. In particular we report the results from our work with a current generation ARMv7 development board to explore ARM-specific issues regarding the software development environment, operating system, performance benchmarks and issues for porting High Energy Physics software

  18. JPP: A Java Pre-Processor

    OpenAIRE

    Kiniry, Joseph R.; Cheong, Elaine

    1998-01-01

    The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.

  19. Raexplore: Enabling Rapid, Automated Architecture Exploration for Full Applications

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yao [Argonne National Lab. (ANL), Argonne, IL (United States); Balaprakash, Prasanna [Argonne National Lab. (ANL), Argonne, IL (United States); Meng, Jiayuan [Argonne National Lab. (ANL), Argonne, IL (United States); Morozov, Vitali [Argonne National Lab. (ANL), Argonne, IL (United States); Parker, Scott [Argonne National Lab. (ANL), Argonne, IL (United States); Kumaran, Kalyan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2014-12-01

    We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list of architectural scaling options for future processor designs.

  20. HTGR core seismic analysis using an array processor

    International Nuclear Information System (INIS)

    Shatoff, H.; Charman, C.M.

    1983-01-01

    A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion

  1. Online Fastbus processor for LEP

    International Nuclear Information System (INIS)

    Mueller, H.

    1986-01-01

    The author describes the online computing aspects of Fastbus systems using a processor module which has been developed at CERN and is now available commercially. These General Purpose Master/Slaves (GPMS) are based on 68000/10 (or optionally 68020/68881) processors. Applications include use as event-filters (DELPHI), supervisory controllers, Fastbus stand-alone diagnostic tools, and multiprocessor array components. The direct mapping of single, 32-bit assembly instructions to execute Fastbus protocols makes the use of a GPM both simple and flexible. Loosely coupled processing in Fastbus networks is possible between GPM's as they support access semaphores and use a two port memory as I/O buffer for Fastbus. Both master and slave-ports support block transfers up to 20 Mbytes/s. The CERN standard Fastbus software and the MoniCa symbolic debugging monitor are available on the GPM with real time, multiprocessing support. (Auth.)

  2. Multiple 3d Approaches for the Architectural Study of the Medieval Abbey of Cormery in the Loire Valley

    Science.gov (United States)

    Pouyet, T.

    2017-02-01

    This paper will focus on the technical approaches used for a PhD thesis regarding architecture and spatial organization of benedict abbeys in Touraine in the Middle Ages, in particular the abbey of Cormery in the heart of the Loire Valley. Monastic space is approached in a diachronic way, from the early Middle Ages to the modern times using multi-sources data: architectural study, written sources, ancient maps, various iconographic documents… Many scales are used in the analysis, from the establishment of the abbeys in a territory to the scale of a building like the tower-entrance of the church of Cormery. These methodological axes have been developed in the research unit CITERES for many years and the 3D technology is now used to go further along in that field. The recording in 3D of the buildings of the abbey of Cormery allows us to work at the scale of the monastery and to produce useful data such as sections or orthoimages of the ground and the walls faces which are afterwards drawn and analysed. The study of these documents, crossed with the other historical sources, allowed us to emphasize the presence of walls older than what we thought and to discover construction elements that had not been recognized earlier and which enhance the debate about the construction date St Paul tower and associated the monastic church.

  3. Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

    International Nuclear Information System (INIS)

    Longoni, G.

    2010-01-01

    The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)

  4. A discussion of tools and techniques for distributed processor based control systems using CAMAC

    International Nuclear Information System (INIS)

    Tippie, J.W.; Scandora, A.E.

    1985-01-01

    This paper describes and analyzes various distributed processor architectures using commercially available CAMAC components. The general orientation is toward distributed control systems using Digital Equipment Corporation LSI11 processors in a CAMAC environment. The paper describes in detail software tools available to simplify the development of applications software and to provide a high-level runtime environment both at the host and the remote processors. Discussion focuses on techniques for downloading of operating systems from a large host and applications tasks written in high-level languages. It also discusses software tools which enable tasks in the remote processors to exchange messages and data with tasks in the host in a simple and elegant way

  5. A system architecture, processor, and communication protocol for secure implants

    NARCIS (Netherlands)

    C. Strydis (Christos); R.M. Seepers (Robert); P. Peris-Lopez (Pedro); D. Siskos (Dimitrios); I. Sourdis (Ioannis)

    2013-01-01

    textabstractSecure and energy-efficient communication between Implantable Medical Devices (IMDs) and authorized external users is attracting increasing attention these days. However, there currently exists no systematic approach to the problem, while solutions from neighboring fields, such as

  6. Evaluation of Instruction Set Processor Architecture by Program Tracing

    Science.gov (United States)

    1974-07-01

    l ■! ... ...i v«!!, la . mm>>i mmmm mm«’ ■wnnvnfnmnmi "■ "»■’" ipMvmpw !■ i Th« rt^iitcr uta^e cia^ .iftcMlipn i NOI« MPiiHntnc... SOJA • 233 117.07 SOJCE | 2850 5101.50 SOJN 1 158 282.82 SOJC a 1811 3295.39 37 SOS • 1279 3980.95 SOSL ■ S IS. 25 sose • 1 21 M SOStE

  7. Performance Evaluation of Superscalar Processor Architecture Through UML

    OpenAIRE

    Taskeen Zaidi; Vipin Saxena

    2013-01-01

    In the current scenario, most of the applications are based upon graphical user interface and dependent upon the object-oriented technology. Software Industries are interested to convert old structured based softwares into object-oriented based softwares and also to reduce the lines of the code of application for reduction in the execution time of application. Therefore, it is a big challenge to reduce the execution time of the application based upon the object-oriented technology. The presen...

  8. Architectural prototyping

    DEFF Research Database (Denmark)

    Bardram, Jakob Eyvind; Christensen, Henrik Bærbak; Hansen, Klaus Marius

    2004-01-01

    A major part of software architecture design is learning how specific architectural designs balance the concerns of stakeholders. We explore the notion of "architectural prototypes", correspondingly architectural prototyping, as a means of using executable prototypes to investigate stakeholders...

  9. Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

    NARCIS (Netherlands)

    Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2013-01-01

    Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly

  10. Onboard Data Processors for Planetary Ice-Penetrating Sounding Radars

    Science.gov (United States)

    Tan, I. L.; Friesenhahn, R.; Gim, Y.; Wu, X.; Jordan, R.; Wang, C.; Clark, D.; Le, M.; Hand, K. P.; Plaut, J. J.

    2011-12-01

    Among the many concerns faced by outer planetary missions, science data storage and transmission hold special significance. Such missions must contend with limited onboard storage, brief data downlink windows, and low downlink bandwidths. A potential solution to these issues lies in employing onboard data processors (OBPs) to convert raw data into products that are smaller and closely capture relevant scientific phenomena. In this paper, we present the implementation of two OBP architectures for ice-penetrating sounding radars tasked with exploring Europa and Ganymede. Our first architecture utilizes an unfocused processing algorithm extended from the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS, Jordan et. al. 2009). Compared to downlinking raw data, we are able to reduce data volume by approximately 100 times through OBP usage. To ensure the viability of our approach, we have implemented, simulated, and synthesized this architecture using both VHDL and Matlab models (with fixed-point and floating-point arithmetic) in conjunction with Modelsim. Creation of a VHDL model of our processor is the principle step in transitioning to actual digital hardware, whether in a FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), and successful simulation and synthesis strongly indicate feasibility. In addition, we examined the tradeoffs faced in the OBP between fixed-point accuracy, resource consumption, and data product fidelity. Our second architecture is based upon a focused fast back projection (FBP) algorithm that requires a modest amount of computing power and on-board memory while yielding high along-track resolution and improved slope detection capability. We present an overview of the algorithm and details of our implementation, also in VHDL. With the appropriate tradeoffs, the use of OBPs can significantly reduce data downlink requirements without sacrificing data product fidelity. Through the development

  11. Architecture on Architecture

    DEFF Research Database (Denmark)

    Olesen, Karen

    2016-01-01

    that is not scientific or academic but is more like a latent body of data that we find embedded in existing works of architecture. This information, it is argued, is not limited by the historical context of the work. It can be thought of as a virtual capacity – a reservoir of spatial configurations that can...... correlation between the study of existing architectures and the training of competences to design for present-day realities.......This paper will discuss the challenges faced by architectural education today. It takes as its starting point the double commitment of any school of architecture: on the one hand the task of preserving the particular knowledge that belongs to the discipline of architecture, and on the other hand...

  12. Towards a Systematic Exploration of the Optimization Space for Many-Core Processors

    NARCIS (Netherlands)

    Fang, J.

    2014-01-01

    The architecture diversity of many-core processors - with their different types of cores, and memory hierarchies - makes the old model of reprogramming every application for every platform infeasible. Therefore, inter-platform portability has become a desirable feature of programming models. While

  13. An FPGA design flow for reconfigurable network-based multi-processor systems on chip

    NARCIS (Netherlands)

    Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

    2007-01-01

    Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity

  14. A general model of concurrency and its implementation as many-core dynamic RISC processors

    NARCIS (Netherlands)

    Bernard, T.; Bousias, K.; Guang, L.; Jesshope, C.R.; Lankamp, M.; van Tol, M.W.; Zhang, L.

    2008-01-01

    This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing

  15. Does the Intel Xeon Phi processor fit HEP workloads?

    Science.gov (United States)

    Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

    2014-06-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  16. High performance deformable image registration algorithms for manycore processors

    CERN Document Server

    Shackleford, James; Sharp, Gregory

    2013-01-01

    High Performance Deformable Image Registration Algorithms for Manycore Processors develops highly data-parallel image registration algorithms suitable for use on modern multi-core architectures, including graphics processing units (GPUs). Focusing on deformable registration, we show how to develop data-parallel versions of the registration algorithm suitable for execution on the GPU. Image registration is the process of aligning two or more images into a common coordinate frame and is a fundamental step to be able to compare or fuse data obtained from different sensor measurements. E

  17. Does the Intel Xeon Phi processor fit HEP workloads?

    International Nuclear Information System (INIS)

    Nowak, A; Bitzes, G; Dotti, A; Lazzaro, A; Jarp, S; Szostek, P; Valsan, L; Botezatu, M; Leduc, J

    2014-01-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  18. Addressing Thermal and Performance Variability Issues in Dynamic Processors

    Energy Technology Data Exchange (ETDEWEB)

    Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Llopis, Pablo [Univ. Carlos III de Madrid (Spain); Zhang, Kaicheng [Northwestern Univ., Evanston, IL (United States); Luo, Yingyi [Northwestern Univ., Evanston, IL (United States); Ogrenci-Memik, Seda [Northwestern Univ., Evanston, IL (United States); Memik, Gokhan [Northwestern Univ., Evanston, IL (United States); Sankaran, Rajesh [Argonne National Lab. (ANL), Argonne, IL (United States); Beckman, Pete [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-03-01

    As CMOS scaling nears its end, parameter variations (process, temperature and voltage) are becoming a major concern. To overcome parameter variations and provide stability, modern processors are becoming dynamic, opportunistically adjusting voltage and frequency based on thermal and energy constraints, which negatively impacts traditional bulk-synchronous parallelism-minded hardware and software designs. As node-level architecture is growing in complexity, implementing variation control mechanisms only with hardware can be a challenging task. In this paper we investigate a software strategy to manage hardwareinduced variations, leveraging low-level monitoring/controlling mechanisms.

  19. Multipurpose silicon photonics signal processor core.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Crudgington, Lee; Thomson, David J; Khokhar, Ali Z; Li, Ke; Cao, Wei; Mashanovich, Goran Z; Capmany, José

    2017-09-21

    Integrated photonics changes the scaling laws of information and communication systems offering architectural choices that combine photonics with electronics to optimize performance, power, footprint, and cost. Application-specific photonic integrated circuits, where particular circuits/chips are designed to optimally perform particular functionalities, require a considerable number of design and fabrication iterations leading to long development times. A different approach inspired by electronic Field Programmable Gate Arrays is the programmable photonic processor, where a common hardware implemented by a two-dimensional photonic waveguide mesh realizes different functionalities through programming. Here, we report the demonstration of such reconfigurable waveguide mesh in silicon. We demonstrate over 20 different functionalities with a simple seven hexagonal cell structure, which can be applied to different fields including communications, chemical and biomedical sensing, signal processing, multiprocessor networks, and quantum information systems. Our work is an important step toward this paradigm.Integrated optical circuits today are typically designed for a few special functionalities and require complex design and development procedures. Here, the authors demonstrate a reconfigurable but simple silicon waveguide mesh with different functionalities.

  20. Computer architecture a quantitative approach

    CERN Document Server

    Hennessy, John L

    2019-01-01

    Computer Architecture: A Quantitative Approach, Sixth Edition has been considered essential reading by instructors, students and practitioners of computer design for over 20 years. The sixth edition of this classic textbook is fully revised with the latest developments in processor and system architecture. It now features examples from the RISC-V (RISC Five) instruction set architecture, a modern RISC instruction set developed and designed to be a free and openly adoptable standard. It also includes a new chapter on domain-specific architectures and an updated chapter on warehouse-scale computing that features the first public information on Google's newest WSC. True to its original mission of demystifying computer architecture, this edition continues the longstanding tradition of focusing on areas where the most exciting computing innovation is happening, while always keeping an emphasis on good engineering design.

  1. Software architecture evolution

    DEFF Research Database (Denmark)

    Barais, Olivier; Le Meur, Anne-Francoise; Duchien, Laurence

    2008-01-01

    Software architectures must frequently evolve to cope with changing requirements, and this evolution often implies integrating new concerns. Unfortunately, when the new concerns are crosscutting, existing architecture description languages provide little or no support for this kind of evolution....... The software architect must modify multiple elements of the architecture manually, which risks introducing inconsistencies. This chapter provides an overview, comparison and detailed treatment of the various state-of-the-art approaches to describing and evolving software architectures. Furthermore, we discuss...... one particular framework named Tran SAT, which addresses the above problems of software architecture evolution. Tran SAT provides a new element in the software architecture descriptions language, called an architectural aspect, for describing new concerns and their integration into an existing...

  2. Multicore technology architecture, reconfiguration, and modeling

    CERN Document Server

    Qadri, Muhammad Yasir

    2013-01-01

    The saturation of design complexity and clock frequencies for single-core processors has resulted in the emergence of multicore architectures as an alternative design paradigm. Nowadays, multicore/multithreaded computing systems are not only a de-facto standard for high-end applications, they are also gaining popularity in the field of embedded computing. The start of the multicore era has altered the concepts relating to almost all of the areas of computer architecture design, including core design, memory management, thread scheduling, application support, inter-processor communication, debu

  3. Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

    Science.gov (United States)

    Hadade, Ioan; di Mare, Luca

    2016-08-01

    Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.

  4. Assembly of finite element methods on graphics processors

    KAUST Repository

    Cecka, Cris; Lew, Adrian J.; Darve, E.

    2010-01-01

    in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing

  5. Design and implementation of a high performance network security processor

    Science.gov (United States)

    Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

    2010-03-01

    The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.

  6. Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

    OpenAIRE

    Noormofidi, Vahid; Atlas, Susan R.; Duan, Huaiyu

    2015-01-01

    We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT on configurations with CPU only, Xeon Phi only and both CPU and Xeon Phi. We also investigate the impact of I/O and the multi-n...

  7. The breaking point of modern processor and platform technology

    CERN Document Server

    Nowak, A; Lazzaro, A; Leduc, J

    2011-01-01

    This work is an overview of state of the art processors used in High Energy Physics, their architecture and an extensive outline of the forthcoming technologies. Silicon process science and hardware design are making constant and rapid progress, and a solid grasp of these developments is imperative to the understanding of their possible future applications, which might include software strategy, optimizations, computing center operations and hardware acquisitions. In particular, the current issue of software and platform scalability is becoming more and more noticeable, and will develop in the near future with the growing core count of single chips and the approach of certain x86 architectural limits. Other topics brought forward include the hard, physical limits of innovation, the applicability of tried and tested computing formulas to modern technologies, as well as an analysis of viable alternate choices for continued development.

  8. Evaluation of the Intel Westmere-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing the 6-core “Westmere-EP” processor with Intel’s previous generation of the same microarchitecture, the “Nehalem-EP”. The former is produced in a new 32nm process, the latter in 45nm. Both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well...

  9. Evaluation of the Intel Nehalem-EX server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...

  10. Reducing adaptive optics latency using Xeon Phi many-core processors

    Science.gov (United States)

    Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah

    2015-11-01

    The next generation of Extremely Large Telescopes (ELTs) for astronomy will rely heavily on the performance of their adaptive optics (AO) systems. Real-time control is at the heart of the critical technologies that will enable telescopes to deliver the best possible science and will require a very significant extrapolation from current AO hardware existing for 4-10 m telescopes. Investigating novel real-time computing architectures and testing their eligibility against anticipated challenges is one of the main priorities of technology development for the ELTs. This paper investigates the suitability of the Intel Xeon Phi, which is a commercial off-the-shelf hardware accelerator. We focus on wavefront reconstruction performance, implementing a straightforward matrix-vector multiplication (MVM) algorithm. We present benchmarking results of the Xeon Phi on a real-time Linux platform, both as a standalone processor and integrated into an existing real-time controller (RTC). Performance of single and multiple Xeon Phis are investigated. We show that this technology has the potential of greatly reducing the mean latency and variations in execution time (jitter) of large AO systems. We present both a detailed performance analysis of the Xeon Phi for a typical E-ELT first-light instrument along with a more general approach that enables us to extend to any AO system size. We show that systematic and detailed performance analysis is an essential part of testing novel real-time control hardware to guarantee optimal science results.

  11. Accuracies Of Optical Processors For Adaptive Optics

    Science.gov (United States)

    Downie, John D.; Goodman, Joseph W.

    1992-01-01

    Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.

  12. Alternative Water Processor Test Development

    Science.gov (United States)

    Pickering, Karen D.; Mitchell, Julie; Vega, Leticia; Adam, Niklas; Flynn, Michael; Wjee (er. Rau); Lunn, Griffin; Jackson, Andrew

    2012-01-01

    The Next Generation Life Support Project is developing an Alternative Water Processor (AWP) as a candidate water recovery system for long duration exploration missions. The AWP consists of biological water processor (BWP) integrated with a forward osmosis secondary treatment system (FOST). The basis of the BWP is a membrane aerated biological reactor (MABR), developed in concert with Texas Tech University. Bacteria located within the MABR metabolize organic material in wastewater, converting approximately 90% of the total organic carbon to carbon dioxide. In addition, bacteria convert a portion of the ammonia-nitrogen present in the wastewater to nitrogen gas, through a combination of nitrogen and denitrification. The effluent from the BWP system is low in organic contaminants, but high in total dissolved solids. The FOST system, integrated downstream of the BWP, removes dissolved solids through a combination of concentration-driven forward osmosis and pressure driven reverse osmosis. The integrated system is expected to produce water with a total organic carbon less than 50 mg/l and dissolved solids that meet potable water requirements for spaceflight. This paper describes the test definition, the design of the BWP and FOST subsystems, and plans for integrated testing.

  13. The UA1 trigger processor

    International Nuclear Information System (INIS)

    Grayer, G.H.

    1981-01-01

    Experiment UA1 is a large multi-purpose spectrometer at the CERN proton-antiproton collider, scheduled for late 1981. The principal trigger is formed on the basis of the energy deposition in calorimeters. A trigger decision taken in under 2.4 microseconds can avoid dead time losses due to the bunched nature of the beam. To achieve this we have built fast 8-bit charge to digital converters followed by two identical digital processors tailored to the experiment. The outputs of groups of the 2440 photomultipliers in the calorimeters are summed to form a total of 288 input channels to the ADCs. A look-up table in RAM is used to convert the digitised photomultiplier signals to energy in one processor, combinations of input channels, and also counts the number of clusters with electromagnetic or hadronic energy above pre-determined levels. Up to twelve combinations of these conditions, together with external information, may be combined in coincidence or in veto to form the final trigger. Provision has been made for testing using simulated data in an off-line mode, and sampling real data when on-line. (orig.)

  14. Data register and processor for multiwire chambers

    International Nuclear Information System (INIS)

    Karpukhin, V.V.

    1985-01-01

    A data register and a processor for data receiving and processing from drift chambers of a device for investigating relativistic positroniums are described. The data are delivered to the register input in the form of the Grey 8 bit code, memorized and transformed to a position code. The register information is delivered to the KAMAK trunk and to the front panel plug. The processor selects particle tracks in a horizontal plane of the facility. ΔY maximum coordinate divergence and minimum point quantity on the track are set from the processor front panel. Processor solution time is 16 μs maximum quantity of simultaneously analyzed coordinates is 16

  15. Sensitometric control of roentgen film processors

    International Nuclear Information System (INIS)

    Forsberg, H.; Karolinska Sjukhuset, Stockholm

    1987-01-01

    Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)

  16. Designing Next Generation Massively Multithreaded Architectures for Irregular Applications

    Energy Technology Data Exchange (ETDEWEB)

    Tumeo, Antonino; Secchi, Simone; Villa, Oreste

    2012-08-31

    Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multi-threaded architectures with large node count, like the Cray XMT, have been shown to address their requirements better than commodity clusters. In this paper we present the approaches that we are currently pursuing to design future generations of these architectures. First, we introduce the Cray XMT and compare it to other multithreaded architectures. We then propose an evolution of the architecture, integrating multiple cores per node and next generation network interconnect. We advocate the use of hardware support for remote memory reference aggregation to optimize network utilization. For this evaluation we developed a highly parallel, custom simulation infrastructure for multi-threaded systems. Our simulator executes unmodified XMT binaries with very large datasets, capturing effects due to contention and hot-spotting, while predicting execution times with greater than 90% accuracy. We also discuss the FPGA prototyping approach that we are employing to study efficient support for irregular applications in next generation manycore processors.

  17. Power estimation on functional level for programmable processors

    Directory of Open Access Journals (Sweden)

    M. Schneider

    2004-01-01

    Full Text Available In diesem Beitrag werden verschiedene Ansätze zur Verlustleistungsschätzung von programmierbaren Prozessoren vorgestellt und bezüglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA. Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Bl¨ocke wird parameterabhängig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schätzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte Parallelitätsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten Schätzwerte für die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler Schätzfehler von 3%. In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA. This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated

  18. Power estimation on functional level for programmable processors

    Science.gov (United States)

    Schneider, M.; Blume, H.; Noll, T. G.

    2004-05-01

    In diesem Beitrag werden verschiedene Ansätze zur Verlustleistungsschätzung von programmierbaren Prozessoren vorgestellt und bezüglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW)-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA). Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Bl¨ocke wird parameterabhängig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schätzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte Parallelitätsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten Schätzwerte für die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler Schätzfehler von 3%. In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW) -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA). This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated the input

  19. Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

    Science.gov (United States)

    Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

    2007-04-01

    In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.

  20. GPU: the biggest key processor for AI and parallel processing

    Science.gov (United States)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  1. A Real-Time Sound Field Rendering Processor

    Directory of Open Access Journals (Sweden)

    Tan Yiyu

    2017-12-01

    Full Text Available Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time at each time step. In this work, a processor with a hybrid architecture is proposed to speed up computation and improve the sample rate of the output sound, and an interface is developed for system scalability through simply cascading many chips to enlarge the simulated area. To render a three-minute Beethoven wave sound in a small shoe-box room with dimensions of 1.28 m × 1.28 m × 0.64 m, the field programming gate array (FPGA-based prototype machine with the proposed architecture carries out the sound rendering at run-time while the software simulation with the OpenMP parallelization takes about 12.70 min on a personal computer (PC with 32 GB random access memory (RAM and an Intel i7-6800K six-core processor running at 3.4 GHz. The throughput in the software simulation is about 194 M grids/s while it is 51.2 G grids/s in the prototype machine even if the clock frequency of the prototype machine is much lower than that of the PC. The rendering processor with a processing element (PE and interfaces consumes about 238,515 gates after fabricated by the 0.18 µm processing technology from the ROHM semiconductor Co., Ltd. (Kyoto Japan, and the power consumption is about 143.8 mW.

  2. Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

    Science.gov (United States)

    2015-06-01

    CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable

  3. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed on purpose to execute pattern matching with a high degree of parallelism. It finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. We report on the performance of the intermedia...

  4. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Andreani, A; The ATLAS collaboration; Beccherle, R; Beretta, M; Cipriani, R; Citraro, S; Citterio, M; Colombo, A; Crescioli, F; Dimas, D; Donati, S; Giannetti, P; Kordas, K; Lanza, A; Liberali, V; Luciano, P; Magalotti, D; Neroutsos, P; Nikolaidis, S; Piendibene, M; Sakellariou, A; Shojaii, S; Sotiropoulou, C-L; Stabile, A

    2014-01-01

    The Associative Memory (AM) system of the FTK processor has been designed to perform pattern matching using the hit information of the ATLAS silicon tracker. The AM is the heart of the FTK and it finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside the FTK, multiple designs and tests have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of the AM chip, an ASIC designed and optimized to perform pattern matching, and two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. Special relevance will be given to the AMchip design that includes two custom cells optimized for low consumption. We repo...

  5. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...

  6. Synchronization of faulty processors in coarse-grained TMR protected partially reconfigurable FPGA designs

    International Nuclear Information System (INIS)

    Kretzschmar, U.; Gomez-Cornejo, J.; Astarloa, A.; Bidarte, U.; Ser, J. Del

    2016-01-01

    The expansion of FPGA technology in numerous application fields is a fact. Single Event Effects (SEE) are a critical factor for the reliability of FPGA based systems. For this reason, a number of researches have been studying fault tolerance techniques to harden different elements of FPGA designs. Using Partial Reconfiguration (PR) in conjunction with Triple Modular Redundancy (TMR) is an emerging approach in recent publications dealing with the implementation of fault tolerant processors on SRAM-based FPGAs. While these works pay great attention to the repair of erroneous instances by means of reconfiguration, the essential step of synchronizing the repaired processors is insufficiently addressed. In this context, this paper poses four different synchronization approaches for soft core processors, which balance differently the trade-off between synchronization speed and hardware overhead. All approaches are assessed in practice by synchronizing TMR protected PicoBlaze processors implemented on a Virtex-5 FPGA. Nevertheless all methods are of a general nature and can be applied for different processor architectures in a straightforward fashion. - Highlights: • Four different synchronization methods for faulty processors are proposed. • The methods balance between synchronization speed and hardware overhead. • They can be applied to TMR-protected reconfigurable FPGA designs. • The proposed schemes are implemented and tested in real hardware.

  7. Producing chopped firewood with firewood processors

    International Nuclear Information System (INIS)

    Kaerhae, K.; Jouhiaho, A.

    2009-01-01

    The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)

  8. Allocating application to group of consecutive processors in fault-tolerant deadlock-free routing path defined by routers obeying same rules for path selection

    Science.gov (United States)

    Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL

    2009-07-21

    In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.

  9. The Potential of the Cell Processor for Scientific Computing

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel; Shalf, John; Oliker, Leonid; Husbands, Parry; Kamil, Shoaib; Yelick, Katherine

    2005-10-14

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of the using the forth coming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. We are the first to present quantitative Cell performance data on scientific kernels and show direct comparisons against leading superscalar (AMD Opteron), VLIW (IntelItanium2), and vector (Cray X1) architectures. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop both analytical models and simulators to predict kernel performance. Our work also explores the complexity of mapping several important scientific algorithms onto the Cells unique architecture. Additionally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

  10. Micro processors for plant protection

    International Nuclear Information System (INIS)

    McAffer, N.T.C.

    1976-01-01

    Micro computers can be used satisfactorily in general protection duties with economic advantages over hardwired systems. The reliability of such protection functions can be enhanced by keeping the task performed by each protection micro processor simple and by avoiding such a task being dependent on others in any substantial way. This implies that vital work done for any task is kept within it and that any communications from it to outside or to it from outside are restricted to those for controlling data transfer. Also that the amount of this data should be the minimum consistent with satisfactory task execution. Technology is changing rapidly and devices may become obsolete and be supplanted by new ones before their theoretical reliability can be confirmed or otherwise by field service. This emphasises the need for users to pool device performance data so that effective reliability judgements can be made within the lifetime of the devices. (orig.) [de

  11. Computer architecture fundamentals and principles of computer design

    CERN Document Server

    Dumas II, Joseph D

    2005-01-01

    Introduction to Computer ArchitectureWhat is Computer Architecture?Architecture vs. ImplementationBrief History of Computer SystemsThe First GenerationThe Second GenerationThe Third GenerationThe Fourth GenerationModern Computers - The Fifth GenerationTypes of Computer SystemsSingle Processor SystemsParallel Processing SystemsSpecial ArchitecturesQuality of Computer SystemsGenerality and ApplicabilityEase of UseExpandabilityCompatibilityReliabilitySuccess and Failure of Computer Architectures and ImplementationsQuality and the Perception of QualityCost IssuesArchitectural Openness, Market Timi

  12. Towards a Process Algebra for Shared Processors

    DEFF Research Database (Denmark)

    Buchholtz, Mikael; Andersen, Jacob; Løvengreen, Hans Henrik

    2002-01-01

    We present initial work on a timed process algebra that models sharing of processor resources allowing preemption at arbitrary points in time. This enables us to model both the functional and the timely behaviour of concurrent processes executed on a single processor. We give a refinement relation...

  13. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)

  14. The communication processor of TUMULT-64

    NARCIS (Netherlands)

    Smit, Gerardus Johannes Maria; Jansen, P.G.

    1988-01-01

    Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,

  15. An interactive parallel processor for data analysis

    International Nuclear Information System (INIS)

    Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

    1984-01-01

    A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors

  16. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Ram Asaray SINGH

    2012-01-01

    High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...

  17. Architectural slicing

    DEFF Research Database (Denmark)

    Christensen, Henrik Bærbak; Hansen, Klaus Marius

    2013-01-01

    Architectural prototyping is a widely used practice, con- cerned with taking architectural decisions through experiments with light- weight implementations. However, many architectural decisions are only taken when systems are already (partially) implemented. This is prob- lematic in the context...... of architectural prototyping since experiments with full systems are complex and expensive and thus architectural learn- ing is hindered. In this paper, we propose a novel technique for harvest- ing architectural prototypes from existing systems, \\architectural slic- ing", based on dynamic program slicing. Given...... a system and a slicing criterion, architectural slicing produces an architectural prototype that contain the elements in the architecture that are dependent on the ele- ments in the slicing criterion. Furthermore, we present an initial design and implementation of an architectural slicer for Java....

  18. Neurovision processor for designing intelligent sensors

    Science.gov (United States)

    Gupta, Madan M.; Knopf, George K.

    1992-03-01

    A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.

  19. Development of a highly reliable CRT processor

    International Nuclear Information System (INIS)

    Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya

    1996-01-01

    Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)

  20. Online track processor for the CDF upgrade

    International Nuclear Information System (INIS)

    Thomson, E. J.

    2002-01-01

    A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented

  1. Computer Generated Inputs for NMIS Processor Verification

    International Nuclear Information System (INIS)

    J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly

    2001-01-01

    Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999

  2. The Chameleon Architecture for Streaming DSP Applications

    Directory of Open Access Journals (Sweden)

    André B. J. Kokkeler

    2007-02-01

    Full Text Available We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm2 in a 130 nm process, is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC via a network interface (NI. Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT and best effort (BE. For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool.

  3. The Chameleon Architecture for Streaming DSP Applications

    Directory of Open Access Journals (Sweden)

    Heysters PaulM

    2007-01-01

    Full Text Available We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm2 in a 130 nm process, is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC via a network interface (NI. Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT and best effort (BE. For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool.

  4. NMRFx Processor: a cross-platform NMR data processing program

    International Nuclear Information System (INIS)

    Norris, Michael; Fetler, Bayard; Marchant, Jan; Johnson, Bruce A.

    2016-01-01

    NMRFx Processor is a new program for the processing of NMR data. Written in the Java programming language, NMRFx Processor is a cross-platform application and runs on Linux, Mac OS X and Windows operating systems. The application can be run in both a graphical user interface (GUI) mode and from the command line. Processing scripts are written in the Python programming language and executed so that the low-level Java commands are automatically run in parallel on computers with multiple cores or CPUs. Processing scripts can be generated automatically from the parameters of NMR experiments or interactively constructed in the GUI. A wide variety of processing operations are provided, including methods for processing of non-uniformly sampled datasets using iterative soft thresholding. The interactive GUI also enables the use of the program as an educational tool for teaching basic and advanced techniques in NMR data analysis.

  5. NMRFx Processor: a cross-platform NMR data processing program

    Energy Technology Data Exchange (ETDEWEB)

    Norris, Michael; Fetler, Bayard [One Moon Scientific, Inc. (United States); Marchant, Jan [University of Maryland Baltimore County, Howard Hughes Medical Institute (United States); Johnson, Bruce A., E-mail: bruce.johnson@asrc.cuny.edu [One Moon Scientific, Inc. (United States)

    2016-08-15

    NMRFx Processor is a new program for the processing of NMR data. Written in the Java programming language, NMRFx Processor is a cross-platform application and runs on Linux, Mac OS X and Windows operating systems. The application can be run in both a graphical user interface (GUI) mode and from the command line. Processing scripts are written in the Python programming language and executed so that the low-level Java commands are automatically run in parallel on computers with multiple cores or CPUs. Processing scripts can be generated automatically from the parameters of NMR experiments or interactively constructed in the GUI. A wide variety of processing operations are provided, including methods for processing of non-uniformly sampled datasets using iterative soft thresholding. The interactive GUI also enables the use of the program as an educational tool for teaching basic and advanced techniques in NMR data analysis.

  6. A VLSI image processor via pseudo-mersenne transforms

    International Nuclear Information System (INIS)

    Sei, W.J.; Jagadeesh, J.M.

    1986-01-01

    The computational burden on image processing in medical fields where a large amount of information must be processed quickly and accurately has led to consideration of special-purpose image processor chip design for some time. The very large scale integration (VLSI) resolution has made it cost-effective and feasible to consider the design of special purpose chips for medical imaging fields. This paper describes a VLSI CMOS chip suitable for parallel implementation of image processing algorithms and cyclic convolutions by using Pseudo-Mersenne Number Transform (PMNT). The main advantages of the PMNT over the Fast Fourier Transform (FFT) are: (1) no multiplications are required; (2) integer arithmetic is used. The design and development of this processor, which operates on 32-point convolution or 5 x 5 window image, are described

  7. Performance of Distributed CFAR Processors in Pearson Distributed Clutter

    Directory of Open Access Journals (Sweden)

    Messali Zoubeida

    2007-01-01

    Full Text Available This paper deals with the distributed constant false alarm rate (CFAR radar detection of targets embedded in heavy-tailed Pearson distributed clutter. In particular, we extend the results obtained for the cell averaging (CA, order statistics (OS, and censored mean level CMLD CFAR processors operating in positive alpha-stable (P&S random variables to more general situations, specifically to the presence of interfering targets and distributed CFAR detectors. The receiver operating characteristics of the greatest of (GO and the smallest of (SO CFAR processors are also determined. The performance characteristics of distributed systems are presented and compared in both homogeneous and in presence of interfering targets. We demonstrate, via simulation results, that the distributed systems when the clutter is modelled as positive alpha-stable distribution offer robustness properties against multiple target situations especially when using the "OR" fusion rule.

  8. Performance of Distributed CFAR Processors in Pearson Distributed Clutter

    Directory of Open Access Journals (Sweden)

    Faouzi Soltani

    2007-01-01

    Full Text Available This paper deals with the distributed constant false alarm rate (CFAR radar detection of targets embedded in heavy-tailed Pearson distributed clutter. In particular, we extend the results obtained for the cell averaging (CA, order statistics (OS, and censored mean level CMLD CFAR processors operating in positive alpha-stable (P&S random variables to more general situations, specifically to the presence of interfering targets and distributed CFAR detectors. The receiver operating characteristics of the greatest of (GO and the smallest of (SO CFAR processors are also determined. The performance characteristics of distributed systems are presented and compared in both homogeneous and in presence of interfering targets. We demonstrate, via simulation results, that the distributed systems when the clutter is modelled as positive alpha-stable distribution offer robustness properties against multiple target situations especially when using the “OR” fusion rule.

  9. Migration of vectorized iterative solvers to distributed memory architectures

    Energy Technology Data Exchange (ETDEWEB)

    Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

    1994-12-31

    Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.

  10. Analytical Bounds on the Threads in IXP1200 Network Processor

    OpenAIRE

    Ramakrishna, STGS; Jamadagni, HS

    2003-01-01

    Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...

  11. The Square Kilometre Array Science Data Processor. Preliminary compute platform design

    International Nuclear Information System (INIS)

    Broekema, P.C.; Nieuwpoort, R.V. van; Bal, H.E.

    2015-01-01

    The Square Kilometre Array is a next-generation radio-telescope, to be built in South Africa and Western Australia. It is currently in its detailed design phase, with procurement and construction scheduled to start in 2017. The SKA Science Data Processor is the high-performance computing element of the instrument, responsible for producing science-ready data. This is a major IT project, with the Science Data Processor expected to challenge the computing state-of-the art even in 2020. In this paper we introduce the preliminary Science Data Processor design and the principles that guide the design process, as well as the constraints to the design. We introduce a highly scalable and flexible system architecture capable of handling the SDP workload

  12. Implementation of an EPICS IOC on an Embedded Soft Core Processor Using Field Programmable Gate Arrays

    International Nuclear Information System (INIS)

    Douglas Curry; Alicia Hofler; Hai Dong; Trent Allison; J. Hovater; Kelly Mahoney

    2005-01-01

    At Jefferson Lab, we have been evaluating soft core processors running an EPICS IOC over μClinux on our custom hardware. A soft core processor is a flexible CPU architecture that is configured in the FPGA as opposed to a hard core processor which is fixed in silicon. Combined with an on-board Ethernet port, the technology incorporates the IOC and digital control hardware within a single FPGA. By eliminating the general purpose computer IOC, the designer is no longer tied to a specific platform, e.g. PC, VME, or VXI, to serve as the intermediary between the high level controls and the field hardware. This paper will discuss the design and development process as well as specific applications for JLab's next generation low-level RF controls and Machine Protection Systems

  13. Dataflow formalisation of real-time streaming applications on a composable and predictable multi-processor SOC

    NARCIS (Netherlands)

    Nelson, A.T.; Goossens, K.G.W.; Akesson, K.B.

    2015-01-01

    Embedded systems often contain multiple applications, some of which have real-time requirements and whose performance must be guaranteed. To efficiently execute applications, modern embedded systems contain Globally Asynchronous Locally Synchronous (GALS) processors, network on chip, DRAM and SRAM

  14. Discrete Fourier transformation processor based on complex radix (−1 + j number system

    Directory of Open Access Journals (Sweden)

    Anidaphi Shadap

    2017-02-01

    Full Text Available Complex radix (−1 + j allows the arithmetic operations of complex numbers to be done without treating the divide and conquer rules, which offers the significant speed improvement of complex numbers computation circuitry. Design and hardware implementation of complex radix (−1 + j converter has been introduced in this paper. Extensive simulation results have been incorporated and an application of this converter towards the implementation of discrete Fourier transformation (DFT processor has been presented. The functionality of the DFT processor have been verified in Xilinx ISE design suite version 14.7 and performance parameters like propagation delay and dynamic switching power consumption have been calculated by Virtuoso platform in Cadence. The proposed DFT processor has been implemented through conversion, multiplication and addition. The performance parameter matrix in terms of delay and power consumption offered a significant improvement over other traditional implementation of DFT processor.

  15. Development of Innovative Design Processor

    International Nuclear Information System (INIS)

    Park, Y.S.; Park, C.O.

    2004-01-01

    The nuclear design analysis requires time-consuming and erroneous model-input preparation, code run, output analysis and quality assurance process. To reduce human effort and improve design quality and productivity, Innovative Design Processor (IDP) is being developed. Two basic principles of IDP are the document-oriented design and the web-based design. The document-oriented design is that, if the designer writes a design document called active document and feeds it to a special program, the final document with complete analysis, table and plots is made automatically. The active documents can be written with ordinary HTML editors or created automatically on the web, which is another framework of IDP. Using the proper mix-up of server side and client side programming under the LAMP (Linux/Apache/MySQL/PHP) environment, the design process on the web is modeled as a design wizard style so that even a novice designer makes the design document easily. This automation using the IDP is now being implemented for all the reload design of Korea Standard Nuclear Power Plant (KSNP) type PWRs. The introduction of this process will allow large reduction in all reload design efforts of KSNP and provide a platform for design and R and D tasks of KNFC. (authors)

  16. A data base processor semantics specification package

    Science.gov (United States)

    Fishwick, P. A.

    1983-01-01

    A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.

  17. Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

    Directory of Open Access Journals (Sweden)

    Ananya Muddukrishna

    2015-01-01

    Full Text Available Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.

  18. Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

    Science.gov (United States)

    Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

    2008-04-01

    This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.

  19. Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

    Directory of Open Access Journals (Sweden)

    Enrico Calore

    2018-06-01

    Full Text Available Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short. We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS technique.

  20. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    Science.gov (United States)

    Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

    2017-10-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.

  1. A fast inner product processor based on equal alignments

    Energy Technology Data Exchange (ETDEWEB)

    Smith, S.P.; Torng, H.C.

    1985-11-01

    Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. The partial products, to be summed in producing an inner product, are reordered according to their ''minimum alignments.'' This reordering brings approximately a 20% savings in hardware-including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b - 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.

  2. Design of an ultra-low-power digital processor for passive UHF RFID tags

    Energy Technology Data Exchange (ETDEWEB)

    Shi Wanggen; Zhuang Yiqi; Li Xiaoming; Wang Xianghua; Jin Zhao; Wang Dan, E-mail: wanggen_shi@163.co [Key Laboratory of the Ministry of Education for Wide Band-Gap Semiconductor Materials and Devices, Institute of Microelectronics, Xidian University, Xi' an 710071 (China)

    2009-04-15

    A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed. This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption. By applying methods like system-level power management, global clock gating and low voltage implementation, the total power of the design is reduced to a few microwatts. In addition, an innovative way for the design of a true RNG is presented, which contributes to both low power and secure data transaction. The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows. The design fits different CMOS technologies and has been taped out using the 2P4M 0.35 mum process of Chartered Semiconductor.

  3. Design of an ultra-low-power digital processor for passive UHF RFID tags

    International Nuclear Information System (INIS)

    Shi Wanggen; Zhuang Yiqi; Li Xiaoming; Wang Xianghua; Jin Zhao; Wang Dan

    2009-01-01

    A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed. This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption. By applying methods like system-level power management, global clock gating and low voltage implementation, the total power of the design is reduced to a few microwatts. In addition, an innovative way for the design of a true RNG is presented, which contributes to both low power and secure data transaction. The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows. The design fits different CMOS technologies and has been taped out using the 2P4M 0.35 μm process of Chartered Semiconductor.

  4. Monte Carlo dose calculation using a cell processor based PlayStation 3 system

    International Nuclear Information System (INIS)

    Chow, James C L; Lam, Phil; Jaffray, David A

    2012-01-01

    This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR G ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.

  5. Monte Carlo dose calculation using a cell processor based PlayStation 3 system

    Science.gov (United States)

    Chow, James C. L.; Lam, Phil; Jaffray, David A.

    2012-02-01

    This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR_GET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.

  6. Photonics and Fiber Optics Processor Lab

    Data.gov (United States)

    Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...

  7. NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

    Science.gov (United States)

    Cheung, Kit; Schultz, Simon R; Luk, Wayne

    2015-01-01

    NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.

  8. Demonstration of two-qubit algorithms with a superconducting quantum processor.

    Science.gov (United States)

    DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J

    2009-07-09

    Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.

  9. Real time monitoring of electron processors

    International Nuclear Information System (INIS)

    Nablo, S.V.; Kneeland, D.R.; McLaughlin, W.L.

    1995-01-01

    A real time radiation monitor (RTRM) has been developed for monitoring the dose rate (current density) of electron beam processors. The system provides continuous monitoring of processor output, electron beam uniformity, and an independent measure of operating voltage or electron energy. In view of the device's ability to replace labor-intensive dosimetry in verification of machine performance on a real-time basis, its application to providing archival performance data for in-line processing is discussed. (author)

  10. Point and track-finding processors for multiwire chambers

    CERN Document Server

    Hansroul, M

    1973-01-01

    The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...

  11. Consumer Electronics Processors for Critical Real-Time Systems: a (Failed) Practical Experience

    OpenAIRE

    Fernandez , Gabriel; Cazorla , Francisco; Abella , Jaume

    2018-01-01

    International audience; The convergence between consumer electronics and critical real-time markets has increased the need for hardware platforms able to deliver high performance as well as high (sustainable) performance guarantees. Using the ARM big.LITTLE architecture as example of those platforms, in this paper we report our experience with one of its implementations (the Qualcomm SnapDragon 810 processor) to derive performance bounds with measurement-based techniques. Our theoretical and ...

  12. Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

    OpenAIRE

    Catalán, Sandra; Herrero, José R.; Igual, Francisco D.; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

    2015-01-01

    Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures,the adaption of these libraries for asymmetric multicore processors (AMPs)is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the...

  13. Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

    OpenAIRE

    Byun, Chansup; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Jones, Michael; Klein, Anna; Michaleas, Peter; Milechin, Lauren; Mullen, Julie; Prout, Andrew; Rosa, Antonio

    2017-01-01

    Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and O...

  14. MAP3D: a media processor approach for high-end 3D graphics

    Science.gov (United States)

    Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

    1999-12-01

    Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.

  15. An Evaluation of an Ada Implementation of the Rete Algorithm for Embedded Flight Processors

    Science.gov (United States)

    1990-12-01

    computers was desired. The VAX VMS operating system has many built-in methods for determining program performance (including VAX PCA), but these methods... overviev , of the target environment-- the MIL-STD-1750A VHSIC Avionic Modular Processor ( VA.IP, running under the Ada Avionics Real-Time Software (AARTS... computers . Mil-STD-1750A, the Air Force’s standard flight computer architecture, however, places severe constraints on applications software processing

  16. Vector and parallel processors in computational science. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Duff, I S; Reid, J K

    1985-01-01

    This volume contains papers from most of the invited talks and from several of the contributed talks and poster sessions presented at VAPP II. The contents present an extensive coverage of all important aspects of vector and parallel processors, including hardware, languages, numerical algorithms and applications. The topics covered include descriptions of new machines (both research and commercial machines), languages and software aids, and general discussions of whole classes of machines and their uses. Numerical methods papers include Monte Carlo algorithms, iterative and direct methods for solving large systems, finite elements, optimization, random number generation and mathematical software. The specific applications covered include neutron diffusion calculations, molecular dynamics, weather forecasting, lattice gauge calculations, fluid dynamics, flight simulation, cartography, image processing and cryptography. Most machines and architecture types are being used for these applications. many refs.

  17. Airborne ocean water lidar (OWL) real time processor (RTP)

    Science.gov (United States)

    Hryszko, M.

    1995-03-01

    The Hyperflo Real Time Processor (RTP) was developed by Pacific-Sierra Research Corporation as a part of the Naval Air Warfare Center's Ocean Water Lidar (OWL) system. The RTP was used for real time support of open ocean field tests at Barbers Point, Hawaii, in March 1993 (EMERALD I field test), and Jacksonville, Florida, in July 1994 (EMERALD I field test). This report describes the system configuration, and accomplishments associated with the preparation and execution of these exercises. This document is intended to supplement the overall test reports and provide insight into the development and use of the PTP. A secondary objective is to provide basic information on the capabilities, versatility and expandability of the Hyperflo RTP for possible future projects. It is assumed herein that the reader has knowledge of the OWL system, field test operations, general lidar processing methods, and basic computer architecture.

  18. Robotic architectures

    CSIR Research Space (South Africa)

    Mtshali, M

    2010-01-01

    Full Text Available In the development of mobile robotic systems, a robotic architecture plays a crucial role in interconnecting all the sub-systems and controlling the system. The design of robotic architectures for mobile autonomous robots is a challenging...

  19. Efficient Programming for Multicore Processor Heterogeneity: OpenMP versus OmpSs

    OpenAIRE

    Butko , Anastasiia; Bruguier , Florent; Gamatié , Abdoulaye; Sassatelli , Gilles

    2017-01-01

    International audience; ARM single-ISA heterogeneous multicore processors combine high-performance big cores with power-efficient small cores. They aim at achieving a suitable balance between performance and energy. How- ever, a main challenge is to program such architectures so as to efficiently exploit their features. In this paper, we study the impact on performance and energy trade-offs of single-ISA architecture according to OpenMP 3.0 and the OmpSs programming models. We consider differ...

  20. TMS320C25 Digital Signal Processor For 2-Dimensional Fast Fourier Transform Computation

    International Nuclear Information System (INIS)

    Ardisasmita, M. Syamsa

    1996-01-01

    The Fourier transform is one of the most important mathematical tool in signal processing and analysis, which converts information from the time/spatial domain into the frequency domain. Even with implementation of the Fast Fourier Transform algorithms in imaging data, the discrete Fourier transform execution consume a lot of time. Digital signal processors are designed specifically to perform computation intensive digital signal processing algorithms. By taking advantage of the advanced architecture. parallel processing, and dedicated digital signal processing (DSP) instruction sets. This device can execute million of DSP operations per second. The device architecture, characteristics and feature suitable for fast Fourier transform application and speed-up are discussed

  1. High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

    Science.gov (United States)

    Ly, Daniel Le; Chow, Paul

    2010-11-01

    Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.

  2. Negative base encoding in optical linear algebra processors

    Science.gov (United States)

    Perlee, C.; Casasent, D.

    1986-01-01

    In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.

  3. Architecture & Environment

    Science.gov (United States)

    Erickson, Mary; Delahunt, Michael

    2010-01-01

    Most art teachers would agree that architecture is an important form of visual art, but they do not always include it in their curriculums. In this article, the authors share core ideas from "Architecture and Environment," a teaching resource that they developed out of a long-term interest in teaching architecture and their fascination with the…

  4. System on chip module configured for event-driven architecture

    Science.gov (United States)

    Robbins, Kevin; Brady, Charles E.; Ashlock, Tad A.

    2017-10-17

    A system on chip (SoC) module is described herein, wherein the SoC modules comprise a processor subsystem and a hardware logic subsystem. The processor subsystem and hardware logic subsystem are in communication with one another, and transmit event messages between one another. The processor subsystem executes software actors, while the hardware logic subsystem includes hardware actors, the software actors and hardware actors conform to an event-driven architecture, such that the software actors receive and generate event messages and the hardware actors receive and generate event messages.

  5. Reframing information architecture

    CERN Document Server

    Resmini, Andrea

    2014-01-01

    Information architecture has changed dramatically since the mid-1990s and earlier conceptions of the world and the internet being different and separate have given way to a much more complex scenario in the present day. In the post-digital world that we now inhabit the digital and the physical blend easily and our activities and usage of information takes place through multiple contexts and via multiple devices and unstable, emergent choreographies. Information architecture now is steadily growing into a channel- or medium-specific multi-disciplinary framework, with contributions coming from a

  6. Homogeneous and Heterogeneous MPSoC Architectures with Network-On-Chip Connectivity for Low-Power and Real-Time Multimedia Signal Processing

    Directory of Open Access Journals (Sweden)

    Sergio Saponara

    2012-01-01

    Full Text Available Two multiprocessor system-on-chip (MPSoC architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices.

  7. Performances of multiprocessor multidisk architectures for continuous media storage

    Science.gov (United States)

    Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

    1996-03-01

    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.

  8. Supercomputers and parallel computation. Based on the proceedings of a workshop on progress in the use of vector and array processors organised by the Institute of Mathematics and its Applications and held in Bristol, 2-3 September 1982

    International Nuclear Information System (INIS)

    Paddon, D.J.

    1984-01-01

    This book is based on the proceedings of a conference on parallel computing held in 1982. There are 18 papers which cover the following topics: VLSI parallel architectures, the theory of parallel computing and vector and array processor computing. One paper on 'Tough Problems in Reactor Design' is indexed separately. All the contributions are on research done in the United Kingdom. Although much of the experience in array processor computing is associated with the ICL distributed array processor (DAP) and this is reflected in the contributions, the research relating to the ICL DAP is relevant to all types of array processors. (UK)

  9. CERN Technical Training: Signal Processor

    CERN Multimedia

    HR Department

    2009-01-01

    A new training is going to be held at CERN on the ADSP SHARC Family. The "System Development and Programming with the Analog Devices’ SHARC Family" course is a 3.5-day hands-on training on Analog Devices SHARC DSPs, focusing on the latest ‘368/9 and 37x families. General DSP architecture, peripherals available, booting up process and DSP code development will be covered. Hardware tools, debugging and hardware design guidelines will be introduced as well. The course id designed for System Designers needing to make informed decisions on design tradeoffs, Hardware Designers needing to develop external interfaces, and Code Developers needing to know how to get the highest performance from their algorithms. The course will take place, in English, from 31 March to 4 April in the CERN Technical Training Center. Few places are still available. Registrations are opened on the Technical Training page. More information on our catalogue: http://cta.cern.ch/cta2/f?p=110:9 or contact us with your que...

  10. METRIC context unit architecture

    Energy Technology Data Exchange (ETDEWEB)

    Simpson, R.O.

    1988-01-01

    METRIC is an architecture for a simple but powerful Reduced Instruction Set Computer (RISC). Its speed comes from the simultaneous processing of several instruction streams, with instructions from the various streams being dispatched into METRIC's execution pipeline as they become available for execution. The pipeline is thus kept full, with a mix of instructions for several contexts in execution at the same time. True parallel programming is supported within a single execution unit, the METRIC Context Unit. METRIC's architecture provides for expansion through the addition of multiple Context Units and of specialized Functional Units. The architecture thus spans a range of size and performance from a single-chip microcomputer up through large and powerful multiprocessors. This research concentrates on the specification of the METRIC Context Unit at the architectural level. Performance tradeoffs made during METRIC's design are discussed, and projections of METRIC's performance are made based on simulation studies.

  11. Architectural heritage or theme park

    Directory of Open Access Journals (Sweden)

    Ignasi Solà-Morales

    1998-04-01

    Full Text Available The growing parallelism between the perception and the consumer use of theme parks and architectural heritage gives rise to a reflection about the fact that the architectural object has been turned into a museum piece, stripped  of its original value and its initial cultural substance to become images exposed to multiple gazes, thus producing what the author calis the "Theme Park effect", with consequences on protected architecture.

  12. Embedded processor extensions for image processing

    Science.gov (United States)

    Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

    2008-04-01

    The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.

  13. Development methods for VLSI-processors

    International Nuclear Information System (INIS)

    Horninger, K.; Sandweg, G.

    1982-01-01

    The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de

  14. The ATLAS Level-1 Muon to Central Trigger Processor Interface

    CERN Document Server

    Berge, D; Farthouat, P; Haas, S; Klofver, P; Krasznahorkay, A; Messina, A; Pauly, T; Schuler, G; Spiwoks, R; Wengler, T; PH-EP

    2007-01-01

    The Muon to Central Trigger Processor Interface (MUCTPI) is part of the ATLAS Level-1 trigger system and connects the output of muon trigger system to the Central Trigger Processor (CTP). At every bunch crossing (BC), the MUCTPI receives information on muon candidates from each of the 208 muon trigger sectors and calculates the total multiplicity for each of six transverse momentum (pT) thresholds. This multiplicity value is then sent to the CTP, where it is used together with the input from the Calorimeter trigger to make the final Level-1 Accept (L1A) decision. In addition the MUCTPI provides summary information to the Level-2 trigger and to the data acquisition (DAQ) system for events selected at Level-1. This information is used to define the regions of interest (RoIs) that drive the Level-2 muontrigger processing. The MUCTPI system consists of a 9U VME chassis with a dedicated active backplane and 18 custom designed modules. The design of the modules is based on state-of-the-art FPGA devices and special ...

  15. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  16. Implicit Unstructured Aerodynamics on Emerging Multi- and Many-Core HPC Architectures

    KAUST Repository

    Al Farhan, Mohammed A.

    2017-03-13

    Shared memory parallelization of PETSc-FUN3D, an unstructured tetrahedral mesh Euler code previously characterized for distributed memory Single Program, Multiple Data (SPMD) for thousands of nodes, is hybridized with shared memory Single Instruction, Multiple Data (SIMD) for hundreds of threads per node. We explore thread-level performance optimizations on state-of-the-art multi- and many-core Intel processors, including the second generation of Xeon Phi, Knights Landing (KNL). We study the performance on the KNL with different configurations of memory and cluster modes, with code optimizations to minimize indirect addressing and enhance the cache locality. The optimizations employed are expected to be of value other unstructured applications as many-core architecture evolves.

  17. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00014247; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea

    2017-01-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with detai...

  18. Multi-threaded ATLAS Simulation on Intel Knights Landing Processors

    CERN Document Server

    Farrell, Steven; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles

    2016-01-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), will be delivered to its users in two phases with the first phase online now and the second phase expected in mid-2016. Cori Phase 2 will be based on the KNL architecture and will contain over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a great use-case for the KNL architecture and supercomputers like Cori. Simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this presentation we will give an overview of the ATLAS simulation application with details on its multi-thr...

  19. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  20. Parallel processor for fast event analysis

    International Nuclear Information System (INIS)

    Hensley, D.C.

    1983-01-01

    Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system

  1. Time Manager Software for a Flight Processor

    Science.gov (United States)

    Zoerne, Roger

    2012-01-01

    Data analysis is a process of inspecting, cleaning, transforming, and modeling data to highlight useful information and suggest conclusions. Accurate timestamps and a timeline of vehicle events are needed to analyze flight data. By moving the timekeeping to the flight processor, there is no longer a need for a redundant time source. If each flight processor is initially synchronized to GPS, they can freewheel and maintain a fairly accurate time throughout the flight with no additional GPS time messages received. How ever, additional GPS time messages will ensure an even greater accuracy. When a timestamp is required, a gettime function is called that immediately reads the time-base register.

  2. Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    NARCIS (Netherlands)

    Moreira, O.

    2012-01-01

    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for

  3. Some questions of using the algebraic coding theory for construction of special-purpose processors in high energy physics spectrometers

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1989-01-01

    The results of investigations of using the algebraic coding theory for the creation of parallel encoders, majority coincidence schemes and coordinate processors for the first and second trigger levels are described. Concrete examples of calculation and structure of special-purpose processor using the table arithmetic method are given for multiplicity t ≤ 5. The question of using parallel and sequential syndrome coding methods for the registration of events with clusters is discussed. 30 refs.; 10 figs

  4. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2012-08-01

    Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

  5. Simulation of a parallel processor on a serial processor: The neutron diffusion equation

    International Nuclear Information System (INIS)

    Honeck, H.C.

    1981-01-01

    Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de

  6. Real-time stereo matching architecture based on 2D MRF model: a memory-efficient systolic array

    Directory of Open Access Journals (Sweden)

    Park Sungchan

    2011-01-01

    Full Text Available Abstract There is a growing need in computer vision applications for stereopsis, requiring not only accurate distance but also fast and compact physical implementation. Global energy minimization techniques provide remarkably precise results. But they suffer from huge computational complexity. One of the main challenges is to parallelize the iterative computation, solving the memory access problem between the big external memory and the massive processors. Remarkable memory saving can be obtained with our memory reduction scheme, and our new architecture is a systolic array. If we expand it into N's multiple chips in a cascaded manner, we can cope with various ranges of image resolutions. We have realized it using the FPGA technology. Our architecture records 19 times smaller memory than the global minimization technique, which is a principal step toward real-time chip implementation of the various iterative image processing algorithms with tiny and distributed memory resources like optical flow, image restoration, etc.

  7. Special purpose processors for high energy physics applications

    International Nuclear Information System (INIS)

    Verkerk, C.

    1978-01-01

    The review on the subject of hardware processors from very fast decision logic for the split field magnet facility at CERN, to a point-finding processor used to relieve the data-acquisition minicomputer from the task of monitoring the SPS experiment is given. Block diagrams of decision making processor, point-finding processor, complanarity and opening angle processor and programmable track selector module are presented and discussed. The applications of fully programmable but slower processor on the one hand, and very fast and programmable decision logic on the other hand are given in this review

  8. Performance analysis of general purpose and digital signal processor kernels for heterogeneous systems-on-chip

    Directory of Open Access Journals (Sweden)

    T. von Sydow

    2003-01-01

    Full Text Available Various reasons like technology progress, flexibility demands, shortened product cycle time and shortened time to market have brought up the possibility and necessity to integrate different architecture blocks on one heterogeneous System-on-Chip (SoC. Architecture blocks like programmable processor cores (DSP- and GPP-kernels, embedded FPGAs as well as dedicated macros will be integral parts of such a SoC. Especially programmable architecture blocks and associated optimization techniques are discussed in this contribution. Design space exploration and thus the choice which architecture blocks should be integrated in a SoC is a challenging task. Crucial to this exploration is the evaluation of the application domain characteristics and the costs caused by individual architecture blocks integrated on a SoC. An ATE-cost function has been applied to examine the performance of the aforementioned programmable architecture blocks. Therefore, representative discrete devices have been analyzed. Furthermore, several architecture dependent optimization steps and their effects on the cost ratios are presented.

  9. XOP: A second generation fast processor for on-line use in high energy physics experiments

    International Nuclear Information System (INIS)

    Lingjaerde, T.

    1981-01-01

    Processors for trigger calculations and data compression in high energy physics are characterized by a high data input capability combined with fas execution of relatively simple routines. In order to achieve the required performance it is advantageous to replace the classical computer instruction-set by microcoded instructions, the various fields of which control the internal subunits in parallel. The fast processor called ESOP is based on such a principle: the different operations are handled step by step by dedicated optimized modules under control of a central instruction unit. Thus, the arithmetic operations, address calculations, conditional checking, loop counts and next instruction evaluation all overlap in time. Based upon the experience from ESOP the architecture of a new processor 'XOP' is beginning to take shape which will be faster and easier to use. In this context the most important innovations are: easy handling of operands in the arithmetic unit by means of three data buses and large data files, a powerful data addressing unit for easy handling of vectors, as well as single operands, and a very flexible logic for conditional branching. Input/output will be made transparent through the introduction of internal fast processors which will be used in conjunction with powerful firmware as a software debugging aid. (orig.)

  10. gFEX, the ATLAS Calorimeter Level-1 Real Time Processor

    CERN Document Server

    AUTHOR|(SzGeCERN)759889; The ATLAS collaboration; Begel, Michael; Chen, Hucheng; Lanni, Francesco; Takai, Helio; Wu, Weihao

    2016-01-01

    The global feature extractor (gFEX) is a component of the Level-1 Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Vertex Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 276 optical fibers with the data transferred at the 40 MHz Large Hadron Collider (LHC) clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor Field-Programmable Gate Array (FPGAs), monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA ...

  11. gFEX, the ATLAS Calorimeter Level 1 Real Time Processor

    CERN Document Server

    Tang, Shaochun; The ATLAS collaboration

    2015-01-01

    The global feature extractor (gFEX) is a component of the Level-1Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 264 optical fibers with the data transferred at the 40 MHz LHC clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor FPGAs, monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA has been designed for testing and verification. The performance ...

  12. Computing on Knights and Kepler Architectures

    International Nuclear Information System (INIS)

    Bortolotti, G; Caberletti, M; Ferraro, A; Giacomini, F; Manzali, M; Maron, G; Salomoni, D; Crimi, G; Zanella, M

    2014-01-01

    A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.

  13. Adaptive Motion Estimation Processor for Autonomous Video Devices

    Directory of Open Access Journals (Sweden)

    Dias T

    2007-01-01

    Full Text Available Motion estimation is the most demanding operation of a video encoder, corresponding to at least 80% of the overall computational cost. As a consequence, with the proliferation of autonomous and portable handheld devices that support digital video coding, data-adaptive motion estimation algorithms have been required to dynamically configure the search pattern not only to avoid unnecessary computations and memory accesses but also to save energy. This paper proposes an application-specific instruction set processor (ASIP to implement data-adaptive motion estimation algorithms that is characterized by a specialized datapath and a minimum and optimized instruction set. Due to its low-power nature, this architecture is highly suitable to develop motion estimators for portable, mobile, and battery-supplied devices. Based on the proposed architecture and the considered adaptive algorithms, several motion estimators were synthesized both for a Virtex-II Pro XC2VP30 FPGA from Xilinx, integrated within an ML310 development platform, and using a StdCell library based on a 0.18 μm CMOS process. Experimental results show that the proposed architecture is able to estimate motion vectors in real time for QCIF and CIF video sequences with a very low-power consumption. Moreover, it is also able to adapt the operation to the available energy level in runtime. By adjusting the search pattern and setting up a more convenient operating frequency, it can change the power consumption in the interval between 1.6 mW and 15 mW.

  14. Manned/Unmanned Common Architecture Program (MCAP) net centric flight tests

    Science.gov (United States)

    Johnson, Dale

    2009-04-01

    Properly architected avionics systems can reduce the costs of periodic functional improvements, maintenance, and obsolescence. With this in mind, the U.S. Army Aviation Applied Technology Directorate (AATD) initiated the Manned/Unmanned Common Architecture Program (MCAP) in 2003 to develop an affordable, high-performance embedded mission processing architecture for potential application to multiple aviation platforms. MCAP analyzed Army helicopter and unmanned air vehicle (UAV) missions, identified supporting subsystems, surveyed advanced hardware and software technologies, and defined computational infrastructure technical requirements. The project selected a set of modular open systems standards and market-driven commercial-off-theshelf (COTS) electronics and software, and, developed experimental mission processors, network architectures, and software infrastructures supporting the integration of new capabilities, interoperability, and life cycle cost reductions. MCAP integrated the new mission processing architecture into an AH-64D Apache Longbow and participated in Future Combat Systems (FCS) network-centric operations field experiments in 2006 and 2007 at White Sands Missile Range (WSMR), New Mexico and at the Nevada Test and Training Range (NTTR) in 2008. The MCAP Apache also participated in PM C4ISR On-the-Move (OTM) Capstone Experiments 2007 (E07) and 2008 (E08) at Ft. Dix, NJ and conducted Mesa, Arizona local area flight tests in December 2005, February 2006, and June 2008.

  15. An Alternative Water Processor for Long Duration Space Missions

    Science.gov (United States)

    Barta, Daniel J.; Pickering, Karen D.; Meyer, Caitlin; Pennsinger, Stuart; Vega, Leticia; Flynn, Michael; Jackson, Andrew; Wheeler, Raymond

    2014-01-01

    A new wastewater recovery system has been developed that combines novel biological and physicochemical components for recycling wastewater on long duration human space missions. Functionally, this Alternative Water Processor (AWP) would replace the Urine Processing Assembly on the International Space Station and reduce or eliminate the need for the multi-filtration beds of the Water Processing Assembly (WPA). At its center are two unique game changing technologies: 1) a biological water processor (BWP) to mineralize organic forms of carbon and nitrogen and 2) an advanced membrane processor (Forward Osmosis Secondary Treatment) for removal of solids and inorganic ions. The AWP is designed for recycling larger quantities of wastewater from multiple sources expected during future exploration missions, including urine, hygiene (hand wash, shower, oral and shave) and laundry. The BWP utilizes a single-stage membrane-aerated biological reactor for simultaneous nitrification and denitrification. The Forward Osmosis Secondary Treatment (FOST) system uses a combination of forward osmosis (FO) and reverse osmosis (RO), is resistant to biofouling and can easily tolerate wastewaters high in non-volatile organics and solids associated with shower and/or hand washing. The BWP has been operated continuously for over 300 days. After startup, the mature biological system averaged 85% organic carbon removal and 44% nitrogen removal, close to stoichiometric maximum based on available carbon. To date, the FOST has averaged 93% water recovery, with a maximum of 98%. If the wastewater is slighty acidified, ammonia rejection is optimal. This paper will provide a description of the technology and summarize results from ground-based testing using real wastewater

  16. Cassava processors' awareness of occupational and environmental ...

    African Journals Online (AJOL)

    A larger percentage (74.5%) of the respondents indicated that the Agricultural Development Programme (ADP) is their source of information. The result also showed that processor's awareness of occupational hazards associated with the different stages of cassava processing vary because their involvement in these stages

  17. A high-speed analog neural processor

    NARCIS (Netherlands)

    Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans

    1994-01-01

    Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight

  18. Beeldverwerking met de Micron Automatic Processor

    OpenAIRE

    Goyens, Frank

    2017-01-01

    Deze thesis is een onderzoek naar toepassingen binnen beeldverwerking op de Micron Automata Processor hardware. De hardware wordt vergeleken met populaire hedendaagse hardware. Ook bevat dit onderzoek nuttige informatie en strategieën voor het ontwikkelen van nieuwe toepassingen. Bevindingen in dit onderzoek omvatten proof of concept algoritmes en een praktische toepassing.

  19. 7 CFR 1215.14 - Processor.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Processor. 1215.14 Section 1215.14 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING AGREEMENTS... CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14...

  20. Simplifying cochlear implant speech processor fitting

    NARCIS (Netherlands)

    Willeboer, C.

    2008-01-01

    Conventional fittings of the speech processor of a cochlear implant (CI) rely to a large extent on the implant recipient's subjective responses. For each of the 22 intracochlear electrodes the recipient has to indicate the threshold level (T-level) and comfortable loudness level (C-level) while

  1. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  2. Space Station Water Processor Process Pump

    Science.gov (United States)

    Parker, David

    1995-01-01

    This report presents the results of the development program conducted under contract NAS8-38250-12 related to the International Space Station (ISS) Water Processor (WP) Process Pump. The results of the Process Pumps evaluation conducted on this program indicates that further development is required in order to achieve the performance and life requirements for the ISSWP.

  3. Interleaved Subtask Scheduling on Multi Processor SOC

    NARCIS (Netherlands)

    Zhe, M.

    2006-01-01

    The ever-progressing semiconductor processing technique has integrated more and more embedded processors on a single system-on-achip (SoC). With such powerful SoC platforms, and also due to the stringent time-to-market deadlines, many functionalities which used to be implemented in ASICs are

  4. User manual Dieka PreProcessor

    NARCIS (Netherlands)

    Valkering, Kasper

    2000-01-01

    This is the user manual belonging to the Dieka-PreProcessor. This application was written by Wenhua Cao and revised and expanded by Kasper Valkering. The aim of this preproccesor is to be able to draw and mesh extrusion dies in ProEngineer, and do the FE-calculation in Dieka. The preprocessor makes

  5. Event analysis using a massively parallel processor

    International Nuclear Information System (INIS)

    Bale, A.; Gerelle, E.; Messersmith, J.; Warren, R.; Hoek, J.

    1990-01-01

    This paper describes a system for performing histogramming of n-tuple data at interactive rates using a commercial SIMD processor array connected to a work-station running the well-known Physics Analysis Workstation software (PAW). Results indicate that an order of magnitude performance improvement over current RISC technology is easily achievable

  6. A Versatile Image Processor For Digital Diagnostic Imaging And Its Application In Computed Radiography

    Science.gov (United States)

    Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.

    1986-06-01

    In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.

  7. Microlens array processor with programmable weight mask and direct optical input

    Science.gov (United States)

    Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen

    1999-03-01

    We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.

  8. THOR Fields and Wave Processor - FWP

    Science.gov (United States)

    Soucek, Jan; Rothkaehl, Hanna; Ahlen, Lennart; Balikhin, Michael; Carr, Christopher; Dekkali, Moustapha; Khotyaintsev, Yuri; Lan, Radek; Magnes, Werner; Morawski, Marek; Nakamura, Rumi; Uhlir, Ludek; Yearby, Keith; Winkler, Marek; Zaslavsky, Arnaud

    2017-04-01

    If selected, Turbulence Heating ObserveR (THOR) will become the first spacecraft mission dedicated to the study of plasma turbulence. The Fields and Waves Processor (FWP) is an integrated electronics unit for all electromagnetic field measurements performed by THOR. FWP will interface with all THOR fields sensors: electric field antennas of the EFI instrument, the MAG fluxgate magnetometer, and search-coil magnetometer (SCM), and perform signal digitization and on-board data processing. FWP box will house multiple data acquisition sub-units and signal analyzers all sharing a common power supply and data processing unit and thus a single data and power interface to the spacecraft. Integrating all the electromagnetic field measurements in a single unit will improve the consistency of field measurement and accuracy of time synchronization. The scientific value of highly sensitive electric and magnetic field measurements in space has been demonstrated by Cluster (among other spacecraft) and THOR instrumentation will further improve on this heritage. Large dynamic range of the instruments will be complemented by a thorough electromagnetic cleanliness program, which will prevent perturbation of field measurements by interference from payload and platform subsystems. Taking advantage of the capabilities of modern electronics and the large telemetry bandwidth of THOR, FWP will provide multi-component electromagnetic field waveforms and spectral data products at a high time resolution. Fully synchronized sampling of many signals will allow to resolve wave phase information and estimate wavelength via interferometric correlations between EFI probes. FWP will also implement a plasma resonance sounder and a digital plasma quasi-thermal noise analyzer designed to provide high cadence measurements of plasma density and temperature complementary to data from particle instruments. FWP will rapidly transmit information about magnetic field vector and spacecraft potential to the

  9. Architectural Contestation

    NARCIS (Netherlands)

    Merle, J.

    2012-01-01

    This dissertation addresses the reductive reading of Georges Bataille's work done within the field of architectural criticism and theory which tends to set aside the fundamental ‘broken’ totality of Bataille's oeuvre and also to narrowly interpret it as a mere critique of architectural form,

  10. Architecture Sustainability

    NARCIS (Netherlands)

    Avgeriou, Paris; Stal, Michael; Hilliard, Rich

    2013-01-01

    Software architecture is the foundation of software system development, encompassing a system's architects' and stakeholders' strategic decisions. A special issue of IEEE Software is intended to raise awareness of architecture sustainability issues and increase interest and work in the area. The

  11. Memory architecture

    NARCIS (Netherlands)

    2012-01-01

    A memory architecture is presented. The memory architecture comprises a first memory and a second memory. The first memory has at least a bank with a first width addressable by a single address. The second memory has a plurality of banks of a second width, said banks being addressable by components

  12. National Positioning, Navigation, and Timing Architecture

    National Research Council Canada - National Science Library

    Huested, Patrick; Popejoy, Paul D

    2008-01-01

    .... The strategy is supported by vectors, or enterprise architecture elements, for using multiple PNT-related phenomenologies and interchangeable PNT solutions, PNT and Communications synergy, and co...

  13. Architectural Narratives

    DEFF Research Database (Denmark)

    Kiib, Hans

    2010-01-01

    a functional framework for these concepts, but tries increasingly to endow the main idea of the cultural project with a spatially aesthetic expression - a shift towards “experience architecture.” A great number of these projects typically recycle and reinterpret narratives related to historical buildings......In this essay, I focus on the combination of programs and the architecture of cultural projects that have emerged within the last few years. These projects are characterized as “hybrid cultural projects,” because they intend to combine experience with entertainment, play, and learning. This essay...... and architectural heritage; another group tries to embed new performative technologies in expressive architectural representation. Finally, this essay provides a theoretical framework for the analysis of the political rationales of these projects and for the architectural representation bridges the gap between...

  14. A high-speed digital signal processor for atmospheric radar, part 7.3A

    Science.gov (United States)

    Brosnahan, J. W.; Woodard, D. M.

    1984-01-01

    The Model SP-320 device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter. The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. Presently available is an I/O module conforming to the Intel Multichannel interface standard; other I/O modules will be designed to meet specific user requirements. The main processor board includes input and output FIFO (First In First Out) memories, both with depths of 4096 W, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MW/s.

  15. Case Study of Using High Performance Commercial Processors in Space

    Science.gov (United States)

    Ferguson, Roscoe C.; Olivas, Zulema

    2009-01-01

    The purpose of the Space Shuttle Cockpit Avionics Upgrade project (1999 2004) was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the re-evaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s were radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but had some ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.

  16. Data collection from FASTBUS to a DEC UNIBUS processor through the UNIBUS-Processor Interface

    International Nuclear Information System (INIS)

    Larwill, M.; Barsotti, E.; Lesny, D.; Pordes, R.

    1983-01-01

    This paper describes the use of the UNIBUS Processor Interface, an interface between FASTBUS and the Digital Equipment Corporation UNIBUS. The UPI was developed by Fermilab and the University of Illinois. Details of the use of this interface in a high energy physics experiment at Fermilab are given. The paper includes a discussion of the operation of the UPI on the UNIBUS of a VAX-11, and plans for using the UPI to perform data acquisition from FASTBUS to a VAX-11 Processor

  17. Architecture Of High Speed Image Processing System

    Science.gov (United States)

    Konishi, Toshio; Hayashi, Hiroshi; Ohki, Tohru

    1988-01-01

    One of architectures for a high speed image processing system which corresponds to a new algorithm for a shape understanding is proposed. And the hardware system which is based on the archtecture was developed. Consideration points of the architecture are mainly that using processors should match with the processing sequence of the target image and that the developed system should be used practically in an industry. As the result, it was possible to perform each processing at a speed of 80 nano-seconds a pixel.

  18. Kalman filter tracking on parallel architectures

    Science.gov (United States)

    Cerati, G.; Elmer, P.; Krutelyov, S.; Lantz, S.; Lefebvre, M.; McDermott, K.; Riley, D.; Tadel, M.; Wittich, P.; Wurthwein, F.; Yagil, A.

    2017-10-01

    We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on Nvidia GPUs.

  19. Array processors based on Gaussian fraction-free method

    Energy Technology Data Exchange (ETDEWEB)

    Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I

    1998-03-01

    The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)

  20. Electromagnetic Physics Models for Parallel Computing Architectures

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  1. Electromagnetic Physics Models for Parallel Computing Architectures

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  2. HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

    Science.gov (United States)

    Gilliland, M. C.; Smith, B. J.; Calvert, W.

    1976-01-01

    The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.

  3. Phase space simulation of collisionless stellar systems on the massively parallel processor

    International Nuclear Information System (INIS)

    White, R.L.

    1987-01-01

    A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem

  4. Architectural technology

    DEFF Research Database (Denmark)

    2005-01-01

    The booklet offers an overall introduction to the Institute of Architectural Technology and its projects and activities, and an invitation to the reader to contact the institute or the individual researcher for further information. The research, which takes place at the Institute of Architectural...... Technology at the Roayl Danish Academy of Fine Arts, School of Architecture, reflects a spread between strategic, goal-oriented pilot projects, commissioned by a ministry, a fund or a private company, and on the other hand projects which originate from strong personal interests and enthusiasm of individual...

  5. Systemic Architecture

    DEFF Research Database (Denmark)

    Poletto, Marco; Pasquero, Claudia

    -up or tactical design, behavioural space and the boundary of the natural and the artificial realms within the city and architecture. A new kind of "real-time world-city" is illustrated in the form of an operational design manual for the assemblage of proto-architectures, the incubation of proto-gardens...... and the coding of proto-interfaces. These prototypes of machinic architecture materialize as synthetic hybrids embedded with biological life (proto-gardens), computational power, behavioural responsiveness (cyber-gardens), spatial articulation (coMachines and fibrous structures), remote sensing (FUNclouds...

  6. Humanizing Architecture

    DEFF Research Database (Denmark)

    Toft, Tanya Søndergaard

    2015-01-01

    The article proposes the urban digital gallery as an opportunity to explore the relationship between ‘human’ and ‘technology,’ through the programming of media architecture. It takes a curatorial perspective when proposing an ontological shift from considering media facades as visual spectacles...... agency and a sense of being by way of dematerializing architecture. This is achieved by way of programming the symbolic to provide new emotional realizations and situations of enlightenment in the public audience. This reflects a greater potential to humanize the digital in media architecture....

  7. Architectural Theatricality

    DEFF Research Database (Denmark)

    Tvedebrink, Tenna Doktor Olsen

    environments and a knowledge gap therefore exists in present hospital designs. Consequently, the purpose of this thesis has been to investigate if any research-based knowledge exist supporting the hypothesis that the interior architectural qualities of eating environments influence patient food intake, health...... and well-being, as well as outline a set of basic design principles ‘predicting’ the future interior architectural qualities of patient eating environments. Methodologically the thesis is based on an explorative study employing an abductive approach and hermeneutic-interpretative strategy utilizing tactics...... and food intake, as well as a series of references exist linking the interior architectural qualities of healthcare environments with the health and wellbeing of patients. On the basis of these findings, the thesis presents the concept of Architectural Theatricality as well as a set of design principles...

  8. Heterogeneous System Architectures from APUs to discrete GPUs

    CERN Multimedia

    CERN. Geneva

    2013-01-01

    We will present the Heterogeneous Systems Architectures that new AMD processors are bringing with the new GCN based GPUs and the new APUs. We will show how together they represent a huge step forward for programming flexibility and performance efficiently for Compute.

  9. Stream-processing pipelines: processing of streams on multiprocessor architecture

    NARCIS (Netherlands)

    Kavaldjiev, N.K.; Smit, Gerardus Johannes Maria; Jansen, P.G.

    In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four

  10. Design and Test Space Exploration of Transport-Triggered Architectures

    NARCIS (Netherlands)

    Zivkovic, V.; Tangelder, R.J.W.T.; Kerkhoff, Hans G.

    2000-01-01

    This paper describes a new approach in the high level design and test of transport-triggered architectures (TTA), a special type of application specific instruction processors (ASIP). The proposed method introduces the test as an additional constraint, besides throughput and circuit area. The

  11. Database architecture evolution: Mammals flourished long before dinosaurs became extinct

    NARCIS (Netherlands)

    S. Manegold (Stefan); M.L. Kersten (Martin); P.A. Boncz (Peter)

    2009-01-01

    textabstractThe holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small

  12. Processor architecture exploration and synthesis of massively parallel multi-processor accelerators in application to LDPC decoding

    NARCIS (Netherlands)

    Jan, Y.; Jóźwiak, Lech

    Numerous modern applications in various fields, such as communication and networking, multimedia, encryption, etc., impose extremely high demands regarding performance while at the same time requiring low energy consumption, low cost, and short design time. Often these very high demands cannot be

  13. Design of Processors with Reconfigurable Microarchitecture

    Directory of Open Access Journals (Sweden)

    Andrey Mokhov

    2014-01-01

    Full Text Available Energy becomes a dominating factor for a wide spectrum of computations: from intensive data processing in “big data” companies resulting in large electricity bills, to infrastructure monitoring with wireless sensors relying on energy harvesting. In this context it is essential for a computation system to be adaptable to the power supply and the service demand, which often vary dramatically during runtime. In this paper we present an approach to building processors with reconfigurable microarchitecture capable of changing the way they fetch and execute instructions depending on energy availability and application requirements. We show how to use Conditional Partial Order Graphs to formally specify the microarchitecture of such a processor, explore the design possibilities for its instruction set, and synthesise the instruction decoder using correct-by-construction techniques. The paper is focused on the design methodology, which is evaluated by implementing a power-proportional version of Intel 8051 microprocessor.

  14. Real time processor for array speckle interferometry

    International Nuclear Information System (INIS)

    Chin, G.; Florez, J.; Borelli, R.; Fong, W.; Miko, J.; Trujillo, C.

    1989-01-01

    With the construction of several new large aperture telescopes and the development of large format array detectors in the near IR, the ability to obtain diffraction limited seeing via IR array speckle interferometry offers a powerful tool. We are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element 2D complex FFT, and to average the power spectrum all within the 25 msec coherence time for speckles at near IR wavelength. The processor is a compact unit controlled by a PC with real time display and data storage capability. It provides the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with off-line methods

  15. RISC Processors and High Performance Computing

    Science.gov (United States)

    Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.

  16. Using of new possibilities of Fermi architecture by development og GPGPU programs

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

    2013-01-01

    Description of additional functions of hardware and software, which are presented in the structure of new architecture of FERMI graphic processors made by company NVIDIA, was given. Recommendations of their use within the realization of algorithms of scientific and technical calculations by means of the graphic processors were given. Application of the new possibilities of FERMI architecture and CUDA technologies (Compute Unified Device Architecture - unified hardware-software decision for parallel calculations on GPU) of NVIDIA Company was described. It was done for time reduction of applications' development which is using possibilities of GPGPU for acceleration of data processing

  17. Multi-Core Processor Memory Contention Benchmark Analysis Case Study

    Science.gov (United States)

    Simon, Tyler; McGalliard, James

    2009-01-01

    Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.

  18. VIRTUS: a multi-processor system in FASTBUS

    International Nuclear Information System (INIS)

    Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

    1986-01-01

    VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)

  19. Low-Latency Embedded Vision Processor (LLEVS)

    Science.gov (United States)

    2016-03-01

    algorithms, low-latency video processing, embedded image processor, wearable electronics, helmet-mounted systems, alternative night / day imaging...external subsystems and data sources with the device. The establishment of data interfaces in terms of data transfer rates, formats and types are...video signals from Near-visible Infrared (NVIR) sensor, Shortwave IR (SWIR) and Longwave IR (LWIR) is the main processing for Night Vision (NI) system

  20. Silicon Processors Using Organically Reconfigurable Techniques (SPORT)

    Science.gov (United States)

    2014-05-19

    AFRL-OSR-VA-TR-2014-0132 SILICON PROCESSORS USING ORGANICALLY RECONFIGURABLE TECHNIQUES ( SPORT ) Dennis Prather UNIVERSITY OF DELAWARE Final Report 05...5a. CONTRACT NUMBER Silicon Processes for Organically Reconfigurable Techniques ( SPORT ) 5b. GRANT NUMBER FA9550-10-1-0363 5c...Contract: Silicon Processes for Organically Reconfigurable Techniques ( SPORT ) Contract #: FA9550-10-1-0363 Reporting Period: 1 July 2010 – 31 December

  1. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  2. Intelligent trigger processor for the crystal box

    International Nuclear Information System (INIS)

    Sanders, G.H.; Butler, H.S.; Cooper, M.D.

    1981-01-01

    A large solid angle modular NaI(Tl) detector with 432 phototubes and 88 trigger scintillators is being used to search simultaneously for three lepton flavor changing decays of muon. A beam of up to 10 6 muons stopping per second with a 6% duty factor would yield up to 1000 triggers per second from random triple coincidences. A reduction of the trigger rate to 10 Hz is required from a hardwired primary trigger processor described in this paper. Further reduction to < 1 Hz is achieved by a microprocessor based secondary trigger processor. The primary trigger hardware imposes voter coincidence logic, stringent timing requirements, and a non-adjacency requirement in the trigger scintillators defined by hardwired circuits. Sophisticated geometric requirements are imposed by a PROM-based matrix logic, and energy and vector-momentum cuts are imposed by a hardwired processor using LSI flash ADC's and digital arithmetic loci. The secondary trigger employs four satellite microprocessors to do a sparse data scan, multiplex the data acquisition channels and apply additional event filtering

  3. Techniques for optimizing inerting in electron processors

    International Nuclear Information System (INIS)

    Rangwalla, I.J.; Korn, D.J.; Nablo, S.V.

    1993-01-01

    The design of an ''inert gas'' distribution system in an electron processor must satisfy a number of requirements. The first of these is the elimination or control of beam produced ozone and NO x which can be transported from the process zone by the product into the work area. Since the tolerable levels for O 3 in occupied areas around the processor are 3 in the beam heated process zone, or exhausting and dilution of the gas at the processor exit. The second requirement of the inerting system is to provide a suitable environment for completing efficient, free radical initiated addition polymerization. The competition between radical loss through de-excitation and that from O 2 quenching must be understood. This group has used gas chromatographic analysis of electron cured coatings to study the trade-offs of delivered dose, dose rate and O 2 concentrations in the process zone to determine the tolerable ranges of parameter excursions for production quality control purposes. These techniques are described for an ink coating system on paperboard, where a broad range of process parameters have been studied (D, D radical, O 2 ). It is then shown how the technique is used to optimize the use of higher purity (10-100 ppm O 2 ) nitrogen gas for inerting, in combination with lower purity (2-20,000 ppm O 2 ) non-cryogenically produced gas, as from a membrane or pressure swing adsorption generators. (author)

  4. Treecode with a Special-Purpose Processor

    Science.gov (United States)

    Makino, Junichiro

    1991-08-01

    We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.

  5. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    Science.gov (United States)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

  6. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

    International Nuclear Information System (INIS)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

    2004-01-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy

  7. Deep Trek Re-configurable Processor for Data Acquisition (RPDA)

    Energy Technology Data Exchange (ETDEWEB)

    Bruce Ohme; Michael Johnson

    2009-06-30

    This report summarizes technical progress achieved during the cooperative research agreement between Honeywell and U.S. Department of Energy to develop a high-temperature Re-configurable Processor for Data Acquisition (RPDA). The RPDA development has incorporated multiple high-temperature (225C) electronic components within a compact co-fired ceramic Multi-Chip-Module (MCM) package. This assembly is suitable for use in down-hole oil and gas applications. The RPDA module is programmable to support a wide range of functionality. Specifically this project has demonstrated functional integrity of the RPDA package and internal components, as well as functional integrity of the RPDA configured to operate as a Multi-Channel Data Acquisition Controller. This report reviews the design considerations, electrical hardware design, MCM package design, considerations for manufacturing assembly, test and screening, and results from prototype assembly and characterization testing.

  8. State-based Communication on Time-predictable Multicore Processors

    DEFF Research Database (Denmark)

    Sørensen, Rasmus Bo; Schoeberl, Martin; Sparsø, Jens

    2016-01-01

    Some real-time systems use a form of task-to-task communication called state-based or sample-based communication that does not impose any flow control among the communicating tasks. The concept is similar to a shared variable, where a reader may read the same value multiple times or may not read...... a given value at all. This paper explores time-predictable implementations of state-based communication in network-on-chip based multicore platforms through five algorithms. With the presented analysis of the implemented algorithms, the communicating tasks of one core can be scheduled independently...... of tasks on other cores. Assuming a specific time-predictable multicore processor, we evaluate how the read and write primitives of the five algorithms contribute to the worst-case execution time of the communicating tasks. Each of the five algorithms has specific capabilities that make them suitable...

  9. Analyzing gigahertz bunch length instabilities with a digital signal processor

    International Nuclear Information System (INIS)

    Stege, R.E. Jr.; Krejcik, P.; Minty, M.G.

    1992-11-01

    A bunch length instability, nicknamed the ''sawtooth'', because of its transient behavior, has been observed at high current running in the Stanford Linear Collider (SLC) electron damping ring. The incompatibility of this instability with successful SLC naming prompted its study using a high bandwidth real-time spectrum analyzer, the Tektronix 3052 digital signal processor (DSP) system. This device has been used to study energy ramping in storage rings but this is the first time it has been used to study transient instability phenomena. It is a particularly valuable tool for use in understanding non-linear, multiple frequency phenomena. The frequency range of this device has been extended through the use of radio frequency (RF) down converters. This paper describes the measurement setup and presents some of the results

  10. Architectural freedom and industrialized architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    to explain that architecture can be thought as a complex and diverse design through customization, telling exactly the revitalized storey about the change to a contemporary sustainable and better performing expression in direct relation to the given context. Through the last couple of years we have...... proportions, to organize the process on site choosing either one room wall components or several rooms wall components – either horizontally or vertically. Combined with the seamless joint the playing with these possibilities the new industrialized architecture can deliver variations in choice of solutions...... for retrofit design. If we add the question of the installations e.g. ventilation to this systematic thinking of building technique we get a diverse and functional architecture, thereby creating a new and clearer story telling about new and smart system based thinking behind architectural expression....

  11. Architectural freedom and industrialized architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    to explain that architecture can be thought as a complex and diverse design through customization, telling exactly the revitalized storey about the change to a contemporary sustainable and better performing expression in direct relation to the given context. Through the last couple of years we have...... expression in the specific housing area. It is the aim of this article to expand the different design strategies which architects can use – to give the individual project attitudes and designs with architectural quality. Through the customized component production it is possible to choose different...... for retrofit design. If we add the question of the installations e.g. ventilation to this systematic thinking of building technique we get a diverse and functional architecture, thereby creating a new and clearer story telling about new and smart system based thinking behind architectural expression....

  12. Architectural freedom and industrialised architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    Architectural freedom and industrialized architecture. Inge Vestergaard, Associate Professor, Cand. Arch. Aarhus School of Architecture, Denmark Noerreport 20, 8000 Aarhus C Telephone +45 89 36 0000 E-mai l inge.vestergaard@aarch.dk Based on the repetitive architecture from the "building boom" 1960...... customization, telling exactly the revitalized storey about the change to a contemporary sustainable and better performed expression in direct relation to the given context. Through the last couple of years we have in Denmark been focusing a more sustainable and low energy building technique, which also include...... to the building physic problems a new industrialized period has started based on light weight elements basically made of wooden structures, faced with different suitable materials meant for individual expression for the specific housing area. It is the purpose of this article to widen up the different design...

  13. System-on-chip architecture and validation for real-time transceiver optimization: APC implementation on FPGA

    Science.gov (United States)

    Suarez, Hernan; Zhang, Yan R.

    2015-05-01

    New radar applications need to perform complex algorithms and process large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression for real-time transceiver optimization are presented, they are based on a System-on-Chip architecture for Xilinx devices. This study also evaluates the performance of dedicated coprocessor as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through the high performance AXI buses, to perform floating-point operations, control the processing blocks, and communicate with external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band tested together with a low-cost channel emulator for different types of waveforms.

  14. Merged ozone profiles from four MIPAS processors

    Science.gov (United States)

    Laeng, Alexandra; von Clarmann, Thomas; Stiller, Gabriele; Dinelli, Bianca Maria; Dudhia, Anu; Raspollini, Piera; Glatthor, Norbert; Grabowski, Udo; Sofieva, Viktoria; Froidevaux, Lucien; Walker, Kaley A.; Zehner, Claus

    2017-04-01

    The Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) was an infrared (IR) limb emission spectrometer on the Envisat platform. Currently, there are four MIPAS ozone data products, including the operational Level-2 ozone product processed at ESA, with the scientific prototype processor being operated at IFAC Florence, and three independent research products developed by the Istituto di Fisica Applicata Nello Carrara (ISAC-CNR)/University of Bologna, Oxford University, and the Karlsruhe Institute of Technology-Institute of Meteorology and Climate Research/Instituto de Astrofísica de Andalucía (KIT-IMK/IAA). Here we present a dataset of ozone vertical profiles obtained by merging ozone retrievals from four independent Level-2 MIPAS processors. We also discuss the advantages and the shortcomings of this merged product. As the four processors retrieve ozone in different parts of the spectra (microwindows), the source measurements can be considered as nearly independent with respect to measurement noise. Hence, the information content of the merged product is greater and the precision is better than those of any parent (source) dataset. The merging is performed on a profile per profile basis. Parent ozone profiles are weighted based on the corresponding error covariance matrices; the error correlations between different profile levels are taken into account. The intercorrelations between the processors' errors are evaluated statistically and are used in the merging. The height range of the merged product is 20-55 km, and error covariance matrices are provided as diagnostics. Validation of the merged dataset is performed by comparison with ozone profiles from ACE-FTS (Atmospheric Chemistry Experiment-Fourier Transform Spectrometer) and MLS (Microwave Limb Sounder). Even though the merging is not supposed to remove the biases of the parent datasets, around the ozone volume mixing ratio peak the merged product is found to have a smaller (up to 0.1 ppmv

  15. PICNIC Architecture.

    Science.gov (United States)

    Saranummi, Niilo

    2005-01-01

    The PICNIC architecture aims at supporting inter-enterprise integration and the facilitation of collaboration between healthcare organisations. The concept of a Regional Health Economy (RHE) is introduced to illustrate the varying nature of inter-enterprise collaboration between healthcare organisations collaborating in providing health services to citizens and patients in a regional setting. The PICNIC architecture comprises a number of PICNIC IT Services, the interfaces between them and presents a way to assemble these into a functioning Regional Health Care Network meeting the needs and concerns of its stakeholders. The PICNIC architecture is presented through a number of views relevant to different stakeholder groups. The stakeholders of the first view are national and regional health authorities and policy makers. The view describes how the architecture enables the implementation of national and regional health policies, strategies and organisational structures. The stakeholders of the second view, the service viewpoint, are the care providers, health professionals, patients and citizens. The view describes how the architecture supports and enables regional care delivery and process management including continuity of care (shared care) and citizen-centred health services. The stakeholders of the third view, the engineering view, are those that design, build and implement the RHCN. The view comprises four sub views: software engineering, IT services engineering, security and data. The proposed architecture is founded into the main stream of how distributed computing environments are evolving. The architecture is realised using the web services approach. A number of well established technology platforms and generic standards exist that can be used to implement the software components. The software components that are specified in PICNIC are implemented in Open Source.

  16. Architectural freedom and industrialised architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    to the building physic problems a new industrialized period has started based on light weight elements basically made of wooden structures, faced with different suitable materials meant for individual expression for the specific housing area. It is the purpose of this article to widen up the different design...... to this systematic thinking of the building technique we get a diverse and functional architecture. Creating a new and clearer story telling about new and smart system based thinking behind the architectural expression....

  17. Huffman-based code compression techniques for embedded processors

    KAUST Repository

    Bonny, Mohamed Talal

    2010-09-01

    The size of embedded software is increasing at a rapid pace. It is often challenging and time consuming to fit an amount of required software functionality within a given hardware resource budget. Code compression is a means to alleviate the problem by providing substantial savings in terms of code size. In this article we introduce a novel and efficient hardware-supported compression technique that is based on Huffman Coding. Our technique reduces the size of the generated decoding table, which takes a large portion of the memory. It combines our previous techniques, Instruction Splitting Technique and Instruction Re-encoding Technique into new one called Combined Compression Technique to improve the final compression ratio by taking advantage of both previous techniques. The instruction Splitting Technique is instruction set architecture (ISA)-independent. It splits the instructions into portions of varying size (called patterns) before Huffman coding is applied. This technique improves the final compression ratio by more than 20% compared to other known schemes based on Huffman Coding. The average compression ratios achieved using this technique are 48% and 50% for ARM and MIPS, respectively. The Instruction Re-encoding Technique is ISA-dependent. It investigates the benefits of reencoding unused bits (we call them reencodable bits) in the instruction format for a specific application to improve the compression ratio. Reencoding those bits can reduce the size of decoding tables by up to 40%. Using this technique, we improve the final compression ratios in comparison to the first technique to 46% and 45% for ARM and MIPS, respectively (including all overhead that incurs). The Combined Compression Technique improves the compression ratio to 45% and 42% for ARM and MIPS, respectively. In our compression technique, we have conducted evaluations using a representative set of applications and we have applied each technique to two major embedded processor architectures

  18. Design and implementation of an ASIP-based cryptography processor for AES, IDEA, and MD5

    OpenAIRE

    Karim Shahbazi; Mohammad Eshghi; Reza Faghih Mirzaee

    2017-01-01

    In this paper, a new 32-bit ASIP-based crypto processor for AES, IDEA, and MD5 is designed. The instruction-set consists of both general purpose and specific instructions for the above cryptographic algorithms. The proposed architecture has nine function units and two data buses. It has also two types of 32-bit instruction formats for executing Memory Reference (M.R.), Register Reference (R.R.), and Input/Output Reference (I/O R.) instructions. The maximum achieved frequency is 166.916 MHz. T...

  19. The definitive guide to ARM Cortex-M3 and Cortex-M4 processors

    CERN Document Server

    Yiu, Joseph

    2013-01-01

    This book presents the background of the ARM architecture and outlines the features of the processors such as the instruction set, interrupt-handling and also demonstrates how to program and utilize the advanced features available such as the Memory Protection Unit (MPU). Chapters on getting started with IAR, Keil, gcc and CooCox CoIDE tools help beginners develop program codes.  Coverage also includes the important areas of software development such as using the low power features, handling information input/output, mixed language projects with assembly and C, and other advanced topics. Tw

  20. How to harness the performance potential of current multi-core processors

    International Nuclear Information System (INIS)

    Jarp, Sverre; Lazzaro, Alfio; Leduc, Julien; Nowak, Andrzej

    2011-01-01

    Leakage currents have put a stop to the semiconductor industry's ability to increase processor frequency in order to enhance the performance of new microprocessors. Instead, we observe a slew of changes inside the micro-architecture with an aim of enhancing the performance. Several of these changes, however, do not translate into automatic speed improvements for the software. This paper discusses the increased complexity of modern microprocessors by separating out into dimensions each feature that impacts performance and mentions briefly ways of improving software, in particular that of the High Energy Physics community, to take full advantage.