network processor architectures: Topics by WorldWideScience.org

Sample records for network processor architectures

Array processor architecture

Science.gov (United States)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Tinuso: A processor architecture for a multi-core hardware simulation platform

DEFF Research Database (Denmark)

Schleuniger, Pascal; Karlsson, Sven

2010-01-01

Multi-core systems have the potential to improve performance, energy and cost properties of embedded systems but also require new design methods and tools to take advantage of the new architectures. Due to the limited accuracy and performance of pure software simulators, we are working on a cycle...... accurate hardware simulation platform. We have developed the Tinuso processor architecture for this platform. Tinuso is a processor architecture optimized for FPGA implementation. The instruction set makes use of predicated instructions and supports C/C++ and assembly language programming. It is designed...... to be easy extendable to maintain the exibility required for the research on multi-core systems. Tinuso contains a co-processor interface to connect to a network interface. This interface allow for communication over an on-chip network. A clock frequency estimation study on a deeply pipelined Tinuso...
A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors

Directory of Open Access Journals (Sweden)

Shoaib Akram

2010-01-01

Full Text Available Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improved multicore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.
Examining the volume efficiency of the cortical architecture in a multi-processor network model.

Science.gov (United States)

Ruppin, E; Schwartz, E L; Yeshurun, Y

1993-01-01

The convoluted form of the sheet-like mammalian cortex naturally raises the question whether there is a simple geometrical reason for the prevalence of cortical architecture in the brains of higher vertebrates. Addressing this question, we present a formal analysis of the volume occupied by a massively connected network or processors (neurons) and then consider the pertaining cortical data. Three gross macroscopic features of cortical organization are examined: the segregation of white and gray matter, the circumferential organization of the gray matter around the white matter, and the folded cortical structure. Our results testify to the efficiency of cortical architecture.
In-Network Adaptation of Video Streams Using Network Processors

Directory of Open Access Journals (Sweden)

Mohammad Shorfuzzaman

2009-01-01

problem can be addressed, near the network edge, by applying dynamic, in-network adaptation (e.g., transcoding of video streams to meet available connection bandwidth, machine characteristics, and client preferences. In this paper, we extrapolate from earlier work of Shorfuzzaman et al. 2006 in which we implemented and assessed an MPEG-1 transcoding system on the Intel IXP1200 network processor to consider the feasibility of in-network transcoding for other video formats and network processor architectures. The use of “on-the-fly” video adaptation near the edge of the network offers the promise of simpler support for a wide range of end devices with different display, and so forth, characteristics that can be used in different types of environments.
Architectural design and analysis of a programmable image processor

International Nuclear Information System (INIS)

Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

2003-01-01

In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)
FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-03-01

The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)
Considerations for control system software verification and validation specific to implementations using distributed processor architectures

International Nuclear Information System (INIS)

Munro, J.K. Jr.

1993-01-01

Until recently, digital control systems have been implemented on centralized processing systems to function in one of several ways: (1) as a single processor control system; (2) as a supervisor at the top of a hierarchical network of multiple processors; or (3) in a client-server mode. Each of these architectures uses a very different set of communication protocols. The latter two architectures also belong to the category of distributed control systems. Distributed control systems can have a central focus, as in the cases just cited, or be quite decentralized in a loosely coupled, shared responsibility arrangement. This last architecture is analogous to autonomous hosts on a local area network. Each of the architectures identified above will have a different set of architecture-associated issues to be addressed in the verification and validation activities during software development. This paper summarizes results of efforts to identify, describe, contrast, and compare these issues
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

Science.gov (United States)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Keystone Business Models for Network Security Processors

Directory of Open Access Journals (Sweden)

Arthur Low

2013-07-01

Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.
Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

Science.gov (United States)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Real time image synthesis on a SIMD linear array processor: algorithms and architectures

International Nuclear Information System (INIS)

Letellier, Laurent

1993-01-01

Nowadays, image synthesis has become a widely used technique. The impressive computing power required for real time applications necessitates the use of parallel architectures. In this context, we evaluate an SIMD linear parallel architecture, SYMPATI2, dedicated to image processing. The objective of this study is to propose a cost-effective graphics accelerator relying on SYMPATI2's modular and programmable structure. The parallelization of basic image synthesis algorithms on SYMPATI2 enables us to determine its limits in this application field. These limits lead us to evaluate a new structure with a fast intercommunication network between processors, but processors have to support the message consistency, which brings about a strong decrease in performance. To solve this problem, we suggest a simple network whose access priorities are represented by tokens. The simulations of this new architecture indicate that the SIMD mode causes a drastic cut in parallelism. To cope with this drawback, we propose a context switching procedure which reduces the SIMD rigidity and increases the parallelism rate significantly. Then, the graphics accelerator we propose is compared with existing graphics workstations. This comparison indicates that our structure, which is able to accelerate both image synthesis and image processing, is competitive and well-suited for multimedia applications. (author) [fr
Optical linear algebra processors - Architectures and algorithms

Science.gov (United States)

Casasent, David

1986-01-01

Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

Science.gov (United States)

Cheung, Kit; Schultz, Simon R; Luk, Wayne

2015-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
Design and implementation of a high performance network security processor

Science.gov (United States)

Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

2010-03-01

The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.
An FPGA design flow for reconfigurable network-based multi-processor systems on chip

NARCIS (Netherlands)

Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

2007-01-01

Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity
Reducing Competitive Cache Misses in Modern Processor Architectures

OpenAIRE

Prisagjanec, Milcho; Mitrevski, Pece

2017-01-01

The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably, the development of modern processor architectures leads to an increased number of cache misses. In this paper, we make an attempt to implement a technique for decreasing the number of competitive cache misses in the first level of cache memory. This tec...
An efficient optical architecture for sparsely connected neural networks

Science.gov (United States)

Hine, Butler P., III; Downie, John D.; Reid, Max B.

1990-01-01

An architecture for general-purpose optical neural network processor is presented in which the interconnections and weights are formed by directing coherent beams holographically, thereby making use of the space-bandwidth products of the recording medium for sparsely interconnected networks more efficiently that the commonly used vector-matrix multiplier, since all of the hologram area is in use. An investigation is made of the use of computer-generated holograms recorded on such updatable media as thermoplastic materials, in order to define the interconnections and weights of a neural network processor; attention is given to limits on interconnection densities, diffraction efficiencies, and weighing accuracies possible with such an updatable thin film holographic device.
Multi-processor network implementations in Multibus II and VME

International Nuclear Information System (INIS)

Briegel, C.

1992-01-01

ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)
Optical chirp z-transform processor with a simplified architecture.

Science.gov (United States)

Ngo, Nam Quoc

2014-12-29

Using a simplified chirp z-transform (CZT) algorithm based on the discrete-time convolution method, this paper presents the synthesis of a simplified architecture of a reconfigurable optical chirp z-transform (OCZT) processor based on the silica-based planar lightwave circuit (PLC) technology. In the simplified architecture of the reconfigurable OCZT, the required number of optical components is small and there are no waveguide crossings which make fabrication easy. The design of a novel type of optical discrete Fourier transform (ODFT) processor as a special case of the synthesized OCZT is then presented to demonstrate its effectiveness. The designed ODFT can be potentially used as an optical demultiplexer at the receiver of an optical fiber orthogonal frequency division multiplexing (OFDM) transmission system.

Analytical Bounds on the Threads in IXP1200 Network Processor

OpenAIRE

Ramakrishna, STGS; Jamadagni, HS

2003-01-01

Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...
Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

Directory of Open Access Journals (Sweden)

Ausif Mahmood

1996-01-01

Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.
A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs).

Science.gov (United States)

Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo

2018-02-01

Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
An orthogonal wavelet division multiple-access processor architecture for LTE-advanced wireless/radio-over-fiber systems over heterogeneous networks

Science.gov (United States)

Mahapatra, Chinmaya; Leung, Victor CM; Stouraitis, Thanos

2014-12-01

The increase in internet traffic, number of users, and availability of mobile devices poses a challenge to wireless technologies. In long-term evolution (LTE) advanced system, heterogeneous networks (HetNet) using centralized coordinated multipoint (CoMP) transmitting radio over optical fibers (LTE A-ROF) have provided a feasible way of satisfying user demands. In this paper, an orthogonal wavelet division multiple-access (OWDMA) processor architecture is proposed, which is shown to be better suited to LTE advanced systems as compared to orthogonal frequency division multiple access (OFDMA) as in LTE systems 3GPP rel.8 (3GPP, http://www.3gpp.org/DynaReport/36300.htm). ROF systems are a viable alternative to satisfy large data demands; hence, the performance in ROF systems is also evaluated. To validate the architecture, the circuit is designed and synthesized on a Xilinx vertex-6 field-programmable gate array (FPGA). The synthesis results show that the circuit performs with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for transform size of 512. A pipelined version of the architecture reduces the power consumption by approximately 89%. We compare our architecture with similar available architectures for resource utilization and timing and provide performance comparison with OFDMA systems for various quality metrics of communication systems. The OWDMA architecture is found to perform better than OFDMA for bit error rate (BER) performance versus signal-to-noise ratio (SNR) in wireless channel as well as ROF media. It also gives higher throughput and mitigates the bad effect of peak-to-average-power ratio (PAPR).
Scalable architecture for a room temperature solid-state quantum information processor.

Science.gov (United States)

Yao, N Y; Jiang, L; Gorshkov, A V; Maurer, P C; Giedke, G; Cirac, J I; Lukin, M D

2012-04-24

The realization of a scalable quantum information processor has emerged over the past decade as one of the central challenges at the interface of fundamental science and engineering. Here we propose and analyse an architecture for a scalable, solid-state quantum information processor capable of operating at room temperature. Our approach is based on recent experimental advances involving nitrogen-vacancy colour centres in diamond. In particular, we demonstrate that the multiple challenges associated with operation at ambient temperature, individual addressing at the nanoscale, strong qubit coupling, robustness against disorder and low decoherence rates can be simultaneously achieved under realistic, experimentally relevant conditions. The architecture uses a novel approach to quantum information transfer and includes a hierarchy of control at successive length scales. Moreover, it alleviates the stringent constraints currently limiting the realization of scalable quantum processors and will provide fundamental insights into the physics of non-equilibrium many-body quantum systems.
Array processors: an introduction to their architecture, software, and applications in nuclear medicine

International Nuclear Information System (INIS)

King, M.A.; Doherty, P.W.; Rosenberg, R.J.; Cool, S.L.

1983-01-01

Array processors are ''number crunchers'' that dramatically enhance the processing power of nuclear medicine computer systems for applicatons dealing with the repetitive operations involved in digital image processing of large segments of data. The general architecture and the programming of array processors are introduced, along with some applications of array processors to the reconstruction of emission tomographic images, digital image enhancement, and functional image formation
Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

DEFF Research Database (Denmark)

Sparsø, Jens

2012-01-01

This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...
ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

OpenAIRE

Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

2005-01-01

The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...
CASPER: Embedding Power Estimation and Hardware-Controlled Power Management in a Cycle-Accurate Micro-Architecture Simulation Platform for Many-Core Multi-Threading Heterogeneous Processors

Directory of Open Access Journals (Sweden)

Arun Ravindran

2012-02-01

Full Text Available Despite the promising performance improvement observed in emerging many-core architectures in high performance processors, high power consumption prohibitively affects their use and marketability in the low-energy sectors, such as embedded processors, network processors and application specific instruction processors (ASIPs. While most chip architects design power-efficient processors by finding an optimal power-performance balance in their design, some use sophisticated on-chip autonomous power management units, which dynamically reduce the voltage or frequencies of idle cores and hence extend battery life and reduce operating costs. For large scale designs of many-core processors, a holistic approach integrating both these techniques at different levels of abstraction can potentially achieve maximal power savings. In this paper we present CASPER, a robust instruction trace driven cycle-accurate many-core multi-threading micro-architecture simulation platform where we have incorporated power estimation models of a wide variety of tunable many-core micro-architectural design parameters, thus enabling processor architects to explore a sufficiently large design space and achieve power-efficient designs. Additionally CASPER is designed to accommodate cycle-accurate models of hardware controlled power management units, enabling architects to experiment with and evaluate different autonomous power-saving mechanisms to study the run-time power-performance trade-offs in embedded many-core processors. We have implemented two such techniques in CASPER–Chipwide Dynamic Voltage and Frequency Scaling, and Performance Aware Core-Specific Frequency Scaling, which show average power savings of 35.9% and 26.2% on a baseline 4-core SPARC based architecture respectively. This power saving data accounts for the power consumption of the power management units themselves. The CASPER simulation platform also provides users with complete support of SPARCV9
Keystone Business Models for Network Security Processors

OpenAIRE

Arthur Low; Steven Muegge

2013-01-01

Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...
Advanced Avionics and Processor Systems for a Flexible Space Exploration Architecture

Science.gov (United States)

Keys, Andrew S.; Adams, James H.; Smith, Leigh M.; Johnson, Michael A.; Cressler, John D.

2010-01-01

The Advanced Avionics and Processor Systems (AAPS) project, formerly known as the Radiation Hardened Electronics for Space Environments (RHESE) project, endeavors to develop advanced avionic and processor technologies anticipated to be used by NASA s currently evolving space exploration architectures. The AAPS project is a part of the Exploration Technology Development Program, which funds an entire suite of technologies that are aimed at enabling NASA s ability to explore beyond low earth orbit. NASA s Marshall Space Flight Center (MSFC) manages the AAPS project. AAPS uses a broad-scoped approach to developing avionic and processor systems. Investment areas include advanced electronic designs and technologies capable of providing environmental hardness, reconfigurable computing techniques, software tools for radiation effects assessment, and radiation environment modeling tools. Near-term emphasis within the multiple AAPS tasks focuses on developing prototype components using semiconductor processes and materials (such as Silicon-Germanium (SiGe)) to enhance a device s tolerance to radiation events and low temperature environments. As the SiGe technology will culminate in a delivered prototype this fiscal year, the project emphasis shifts its focus to developing low-power, high efficiency total processor hardening techniques. In addition to processor development, the project endeavors to demonstrate techniques applicable to reconfigurable computing and partially reconfigurable Field Programmable Gate Arrays (FPGAs). This capability enables avionic architectures the ability to develop FPGA-based, radiation tolerant processor boards that can serve in multiple physical locations throughout the spacecraft and perform multiple functions during the course of the mission. The individual tasks that comprise AAPS are diverse, yet united in the common endeavor to develop electronics capable of operating within the harsh environment of space. Specifically, the AAPS tasks for
FPGA Based Intelligent Co-operative Processor in Memory Architecture

DEFF Research Database (Denmark)

Ahmed, Zaki; Sotudeh, Reza; Hussain, Dil Muhammad Akbar

2011-01-01

benefits of PIM, a concept of Co-operative Intelligent Memory (CIM) was developed by the intelligent system group of University of Hertfordshire, based on the previously developed Co-operative Pseudo Intelligent Memory (CPIM). This paper provides an overview on previous works (CPIM, CIM) and realization......In a continuing effort to improve computer system performance, Processor-In-Memory (PIM) architecture has emerged as an alternative solution. PIM architecture incorporates computational units and control logic directly on the memory to provide immediate access to the data. To exploit the potential...
The architecture of a video image processor for the space station

Science.gov (United States)

Yalamanchili, S.; Lee, D.; Fritze, K.; Carpenter, T.; Hoyme, K.; Murray, N.

1987-01-01

The architecture of a video image processor for space station applications is described. The architecture was derived from a study of the requirements of algorithms that are necessary to produce the desired functionality of many of these applications. Architectural options were selected based on a simulation of the execution of these algorithms on various architectural organizations. A great deal of emphasis was placed on the ability of the system to evolve and grow over the lifetime of the space station. The result is a hierarchical parallel architecture that is characterized by high level language programmability, modularity, extensibility and can meet the required performance goals.
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks

Science.gov (United States)

Kim, Deokho; Park, Karam; Ro, Won W.

2011-01-01

While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

Science.gov (United States)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

Directory of Open Access Journals (Sweden)

David R. W. Barr

2009-01-01

Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
Clock generators for SOC processors circuits and architectures

CERN Document Server

Fahim, Amr

2004-01-01

This book explores the design of fully-integrated frequency synthesizers suitable for system-on-a-chip (SOC) processors. The text takes a more global design perspective in jointly examining the design space at the circuit level as well as at the architectural level. The comprehensive coverage includes summary chapters on circuit theory as well as feedback control theory relevant to the operation of phase locked loops (PLLs). On the circuit level, the discussion includes low-voltage analog design in deep submicron digital CMOS processes, effects of supply noise, substrate noise, as well device noise. On the architectural level, the discussion includes PLL analysis using continuous-time as well as discrete-time models, linear and nonlinear effects of PLL performance, and detailed analysis of locking behavior. The book provides numerous real world applications, as well as practical rules-of-thumb for modern designers to use at the system, architectural, as well as the circuit level.
Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2015-01-01

A comparative analysis has been made to describe the potentialities of hardware and software tools of two most widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU architectures are exemplified by fragments of GPGPU programs. Time consumption for the program development has been estimated. Some pieces of advice are given as to the optimum choice of the GPU type for speeding up the processing of scientific research results. Recommendations are formulated for the use of software tools that reduce the time of GPGPU application programming for the given types of graphic processors
High-speed packet filtering utilizing stream processors

Science.gov (United States)

Hummel, Richard J.; Fulp, Errin W.

2009-04-01

Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

OpenAIRE

Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

2015-01-01

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware ...

Optimizing Vector-Quantization Processor Architecture for Intelligent Query-Search Applications

Science.gov (United States)

Xu, Huaiyu; Mita, Yoshio; Shibata, Tadashi

2002-04-01

The architecture of a very large scale integration (VLSI) vector-quantization processor (VQP) has been optimized to develop a general-purpose intelligent query-search agent. The agent performs a similarity-based search in a large-volume database. Although similarity-based search processing is computationally very expensive, latency-free searches have become possible due to the highly parallel maximum-likelihood search architecture of the VQP chip. Three architectures of the VQP chip have been studied and their performances are compared. In order to give reasonable searching results according to the different policies, the concept of penalty function has been introduced into the VQP. An E-commerce real-estate agency system has been developed using the VQP chip implemented in a field-programmable gate array (FPGA) and the effectiveness of such an agency system has been demonstrated.
Extending and implementing the Self-adaptive Virtual Processor for distributed memory architectures

NARCIS (Netherlands)

van Tol, M.W.; Koivisto, J.

2011-01-01

Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current
A COMPARATIVE STUDY OF SYSTEM NETWORK ARCHITECTURE Vs DIGITAL NETWORK ARCHITECTURE

OpenAIRE

Seema; Mukesh Arya

2011-01-01

The efficient managing system of sources is mandatory for the successful running of any network. Here this paper describes the most popular network architectures one of developed by IBM, System Network Architecture (SNA) and other is Digital Network Architecture (DNA). As we know that the network standards and protocols are needed for the network developers as well as users. Some standards are The IEEE 802.3 standards (The Institute of Electrical and Electronics Engineers 1980) (LAN), IBM Sta...
Deep Space Network information system architecture study

Science.gov (United States)

Beswick, C. A.; Markley, R. W. (Editor); Atkinson, D. J.; Cooper, L. P.; Tausworthe, R. C.; Masline, R. C.; Jenkins, J. S.; Crowe, R. A.; Thomas, J. L.; Stoloff, M. J.

1992-01-01

The purpose of this article is to describe an architecture for the DSN information system in the years 2000-2010 and to provide guidelines for its evolution during the 1990's. The study scope is defined to be from the front-end areas at the antennas to the end users (spacecraft teams, principal investigators, archival storage systems, and non-NASA partners). The architectural vision provides guidance for major DSN implementation efforts during the next decade. A strong motivation for the study is an expected dramatic improvement in information-systems technologies--i.e., computer processing, automation technology (including knowledge-based systems), networking and data transport, software and hardware engineering, and human-interface technology. The proposed Ground Information System has the following major features: unified architecture from the front-end area to the end user; open-systems standards to achieve interoperability; DSN production of level 0 data; delivery of level 0 data from the Deep Space Communications Complex, if desired; dedicated telemetry processors for each receiver; security against unauthorized access and errors; and highly automated monitor and control.
A scalable single-chip multi-processor architecture with on-chip RTOS kernel

NARCIS (Netherlands)

Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

2003-01-01

Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a
Re-engineering Nascom's network management architecture

Science.gov (United States)

Drake, Brian C.; Messent, David

1994-01-01

The development of Nascom systems for ground communications began in 1958 with Project Vanguard. The low-speed systems (rates less than 9.6 Kbs) were developed following existing standards; but, there were no comparable standards for high-speed systems. As a result, these systems were developed using custom protocols and custom hardware. Technology has made enormous strides since the ground support systems were implemented. Standards for computer equipment, software, and high-speed communications exist and the performance of current workstations exceeds that of the mainframes used in the development of the ground systems. Nascom is in the process of upgrading its ground support systems and providing additional services. The Message Switching System (MSS), Communications Address Processor (CAP), and Multiplexer/Demultiplexer (MDM) Automated Control System (MACS) are all examples of Nascom systems developed using standards such as, X-windows, Motif, and Simple Network Management Protocol (SNMP). Also, the Earth Observing System (EOS) Communications (Ecom) project is stressing standards as an integral part of its network. The move towards standards has produced a reduction in development, maintenance, and interoperability costs, while providing operational quality improvement. The Facility and Resource Manager (FARM) project has been established to integrate the Nascom networks and systems into a common network management architecture. The maximization of standards and implementation of computer automation in the architecture will lead to continued cost reductions and increased operational efficiency. The first step has been to derive overall Nascom requirements and identify the functionality common to all the current management systems. The identification of these common functions will enable the reuse of processes in the management architecture and promote increased use of automation throughout the Nascom network. The MSS, CAP, MACS, and Ecom projects have indicated
The Chameleon Architecture for Streaming DSP Applications

Directory of Open Access Journals (Sweden)

André B. J. Kokkeler

2007-02-01

Full Text Available We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm2 in a 130 nm process, is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC via a network interface (NI. Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT and best effort (BE. For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool.
The Chameleon Architecture for Streaming DSP Applications

Directory of Open Access Journals (Sweden)

Heysters PaulM

2007-01-01

Full Text Available We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm2 in a 130 nm process, is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC via a network interface (NI. Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT and best effort (BE. For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool.
Green Secure Processors: Towards Power-Efficient Secure Processor Design

Science.gov (United States)

Chhabra, Siddhartha; Solihin, Yan

With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.
Very Long Instruction Word Processors

Indian Academy of Sciences (India)

Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

Science.gov (United States)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
A High Performance VLSI Computer Architecture For Computer Graphics

Science.gov (United States)

Chin, Chi-Yuan; Lin, Wen-Tai

1988-10-01

A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Towards a networkArchitecture

DEFF Research Database (Denmark)

Rüdiger, Bjarne; Tournay, Bruno

2001-01-01

Planche, bidrag til DAL-konkurrencen. Hvor industrien har været inspirationen for udviklingen af den moderne arkitektur, er IT det tekniske og æstetiske grundlag for den spirende NetworkArchitecture. Computeren og netværker af computerne er således mere end en metafor for NetworkArchitecture....... NetworkArchitecture består af intelligente byggekomponenter forbundet med hinanden i et netværk og i interaktion med omgivelser....
mAgic-FPU and MADE: A customizable VLIW core and the modular VLIW processor architecture description environment

Science.gov (United States)

Paolucci, Pier S.; Kajfasz, Philippe; Bonnot, Philippe; Candaele, Bernard; Maufroid, Daniel; Pastorelli, Elena; Ricciardi, Andrea; Fusella, Yves; Guarino, Eugenio

2001-09-01

mAgic-FPU is the architecture of a family of VLIW cores for configurable system level integration of floating and fixed point computing power. mAgic customization permits the designer to tune basic parameters, such as the computing power/memory access ratio of the core processor, the number of available arithmetic operation per cycle, the register file size and number of port, as well as of the number of arithmetic operators. The reconfiguration (e.g., of register file size and number of port, as well as of the number of arithmetic operators) is supported by the software environment MADE (Modular VLIW processor Architecture and Assembler Description Environment). MADE reads an architecture description file and produces a customized assembler-scheduler for the target VLIW architecture, configuring a general purpose VLIW optimizer-scheduler engine. The mAgic-FPU core architecture satisfies the requisite of portability among silicon foundries. The first members of the mAgic FPU core family architecture fit the requirements of 'Smart Antenna for Adaptive Beam-Forming processing' and 'Physical Sound Synthesis'. The first 1 GigaFlops mAgic core will run at 100 MHz within an area of 40 mm 2 in 0.25 μm ATMEL CMOS technology in first half 2002.
Multiple Embedded Processors for Fault-Tolerant Computing

Science.gov (United States)

Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

2005-01-01

A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Directions in parallel processor architecture, and GPUs too

CERN Multimedia

CERN. Geneva

2014-01-01

Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.
The Molen Polymorphic Media Processor

NARCIS (Netherlands)

Kuzmanov, G.K.

2004-01-01

In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The
Balanced Bipartite Graph Based Register Allocation for Network Processors in Mobile and Wireless Networks

Directory of Open Access Journals (Sweden)

Feilong Tang

2010-01-01

Full Text Available Mobile and wireless networks are the integrant infrastructure of mobile and pervasive computing that aims at providing transparent and preferred information and services for people anytime anywhere. In such environments, end-to-end network bandwidth is crucial to improve user's transparent experience when providing on-demand services such as mobile video playing. As a result, powerful computing power is required for networked nodes, especially for routers. General-purpose processors cannot meet such requirements due to their limited processing ability, and poor programmability and scalability. Intel's network processor IXP is specially designed for fast packet processing to achieve a broad bandwidth. IXP provides a large number of registers to reduce the number of memory accesses. Registers in an IXP are physically partitioned as two banks so that two source operands in an instruction have to come from the two banks respectively, which makes the IXP register allocation tricky and different from conventional ones. In this paper, we investigate an approach for efficiently generating balanced bipartite graph and register allocation algorithms for the dual-bank register allocation in IXPs. The paper presents a graph uniform 2-way partition algorithm (FPT, which provides an optimal solution to the graph partition, and a heuristic algorithm for generating balanced bipartite graph. Finally, we design a framework for IXP register allocation. Experimental results demonstrate the framework and the algorithms are efficient in register allocation for IXP network processors.
Study on Optimization of I and C Architecture for Research Reactors Using Bayesian Networks

Energy Technology Data Exchange (ETDEWEB)

Rahman, Khaili Ur; Shin, Jinsoo; Heo, Gyunyoung [Kyung Hee Univ., Yongin (Korea, Republic of)

2013-07-01

The optimization in terms of redundancy of modules and components in Instrumentation and Control (I and C) architecture is based on cost and availability assuming regulatory requirements are satisfied. The motive of this study is to find an optimized I and C architecture, either in hybrid formation, fully digital or analog, with respect to system availability and relative cost of architecture. The cost of research reactors I and C systems is prone to have effect on marketing competitiveness. As a demonstrative example, the reactor protection system of research reactors is selected. The four cases with different architecture formation were developed with single and double redundancy of bi-stable modules, coincidence processor module, and safety or protection circuit actuation logic. The architecture configurations are transformed to reliability block diagram (RBD) based on logical operation and function of modules. A Bayesian Network (BN) model is constructed from RBD to assess availability. The cost estimation was proposed and reliability cost index RI was suggested.
Study on Optimization of I and C Architecture for Research Reactors Using Bayesian Networks

International Nuclear Information System (INIS)

Rahman, Khaili Ur; Shin, Jinsoo; Heo, Gyunyoung

2013-01-01

The optimization in terms of redundancy of modules and components in Instrumentation and Control (I and C) architecture is based on cost and availability assuming regulatory requirements are satisfied. The motive of this study is to find an optimized I and C architecture, either in hybrid formation, fully digital or analog, with respect to system availability and relative cost of architecture. The cost of research reactors I and C systems is prone to have effect on marketing competitiveness. As a demonstrative example, the reactor protection system of research reactors is selected. The four cases with different architecture formation were developed with single and double redundancy of bi-stable modules, coincidence processor module, and safety or protection circuit actuation logic. The architecture configurations are transformed to reliability block diagram (RBD) based on logical operation and function of modules. A Bayesian Network (BN) model is constructed from RBD to assess availability. The cost estimation was proposed and reliability cost index RI was suggested

Dual-core Itanium Processor

CERN Multimedia

2006-01-01

Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.
Heterogeneous network architectures

DEFF Research Database (Denmark)

Christiansen, Henrik Lehrmann

2006-01-01

is flexibility. This thesis investigates such heterogeneous network architectures and how to make them flexible. A survey of algorithms for network design is presented, and it is described how using heuristics can increase the speed. A hierarchical, MPLS based network architecture is described......Future networks will be heterogeneous! Due to the sheer size of networks (e.g., the Internet) upgrades cannot be instantaneous and thus heterogeneity appears. This means that instead of trying to find the olution, networks hould be designed as being heterogeneous. One of the key equirements here...... and it is discussed that it is advantageous to heterogeneous networks and illustrated by a number of examples. Modeling and simulation is a well-known way of doing performance evaluation. An approach to event-driven simulation of communication networks is presented and mixed complexity modeling, which can simplify...
A performance analysis of advanced I/O architectures for PC-based network file servers

Science.gov (United States)

Huynh, K. D.; Khoshgoftaar, T. M.

1994-12-01

In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.
Virtualized cognitive network architecture for 5G cellular networks

KAUST Repository

Elsawy, Hesham

2015-07-17

Cellular networks have preserved an application agnostic and base station (BS) centric architecture1 for decades. Network functionalities (e.g. user association) are decided and performed regardless of the underlying application (e.g. automation, tactile Internet, online gaming, multimedia). Such an ossified architecture imposes several hurdles against achieving the ambitious metrics of next generation cellular systems. This article first highlights the features and drawbacks of such architectural ossification. Then the article proposes a virtualized and cognitive network architecture, wherein network functionalities are implemented via software instances in the cloud, and the underlying architecture can adapt to the application of interest as well as to changes in channels and traffic conditions. The adaptation is done in terms of the network topology by manipulating connectivities and steering traffic via different paths, so as to attain the applications\\' requirements and network design objectives. The article presents cognitive strategies to implement some of the classical network functionalities, along with their related implementation challenges. The article further presents a case study illustrating the performance improvement of the proposed architecture as compared to conventional cellular networks, both in terms of outage probability and handover rate.
An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

Science.gov (United States)

Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

2015-12-01

Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.
The architectural design of networks of protein domain architectures.

Science.gov (United States)

Hsu, Chia-Hsin; Chen, Chien-Kuo; Hwang, Ming-Jing

2013-08-23

Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature's blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective.
Making CSB + -Trees Processor Conscious

DEFF Research Database (Denmark)

Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

2005-01-01

of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....
Information network architectures

Science.gov (United States)

Murray, N. D.

1985-01-01

Graphs, charts, diagrams and outlines of information relative to information network architectures for advanced aerospace missions, such as the Space Station, are presented. Local area information networks are considered a likely technology solution. The principle needs for the network are listed.
FTS2000 network architecture

Science.gov (United States)

Klenart, John

1991-01-01

The network architecture of FTS2000 is graphically depicted. A map of network A topology is provided, with interservice nodes. Next, the four basic element of the architecture is laid out. Then, the FTS2000 time line is reproduced. A list of equipment supporting FTS2000 dedicated transmissions is given. Finally, access alternatives are shown.
UMA/GAN network architecture analysis

Science.gov (United States)

Yang, Liang; Li, Wensheng; Deng, Chunjian; Lv, Yi

2009-07-01

This paper is to critically analyze the architecture of UMA which is one of Fix Mobile Convergence (FMC) solutions, and also included by the third generation partnership project(3GPP). In UMA/GAN network architecture, UMA Network Controller (UNC) is the key equipment which connects with cellular core network and mobile station (MS). UMA network could be easily integrated into the existing cellular networks without influencing mobile core network, and could provides high-quality mobile services with preferentially priced indoor voice and data usage. This helps to improve subscriber's experience. On the other hand, UMA/GAN architecture helps to integrate other radio technique into cellular network which includes WiFi, Bluetooth, and WiMax and so on. This offers the traditional mobile operators an opportunity to integrate WiMax technique into cellular network. In the end of this article, we also give an analysis of potential influence on the cellular core networks ,which is pulled by UMA network.
NEBULAS A High Performance Data-Driven Event-Building Architecture based on an Asynchronous Self-Routing Packet-Switching Network

CERN Multimedia

Costa, M; Letheren, M; Djidi, K; Gustafsson, L; Lazraq, T; Minerskjold, M; Tenhunen, H; Manabe, A; Nomachi, M; Watase, Y

2002-01-01

RD31 : The project is evaluating a new approach to event building for level-two and level-three processor farms at high rate experiments. It is based on the use of commercial switching fabrics to replace the traditional bus-based architectures used in most previous data acquisition sytems. Switching fabrics permit the construction of parallel, expandable, hardware-driven event builders that can deliver higher aggregate throughput than the bus-based architectures. A standard industrial switching fabric technology is being evaluated. It is based on Asynchronous Transfer Mode (ATM) packet-switching network technology. Commercial, expandable ATM switching fabrics and processor interfaces, now being developed for the future Broadband ISDN infrastructure, could form the basis of an implementation. The goals of the project are to demonstrate the viability of this approach, to evaluate the trade-offs involved in make versus buy options, to study the interfacing of the physics frontend data buffers to such a fabric, a...
Energy Model of Networks-on-Chip and a Bus

NARCIS (Netherlands)

Wolkotte, P.T.; Smit, Gerardus Johannes Maria; Kavaldjiev, N.K.; Becker, Jens E.; Becker, Jürgen; Nurmi, J.; Takala, J.; Hamalainen, T.D.

2005-01-01

A Network-on-Chip (NoC) is an energy-efficient onchip communication architecture for Multi-Processor Systemon-Chip (MPSoC) architectures. In earlier papers we proposed two Network-on-Chip architectures based on packet-switching and circuit-switching. In this paper we derive an energy model for both
Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

International Nuclear Information System (INIS)

Collette, Thierry

1992-01-01

Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr
Distributed Prognostics and Health Management with a Wireless Network Architecture

Science.gov (United States)

Goebel, Kai; Saha, Sankalita; Sha, Bhaskar

2013-01-01

A heterogeneous set of system components monitored by a varied suite of sensors and a particle-filtering (PF) framework, with the power and the flexibility to adapt to the different diagnostic and prognostic needs, has been developed. Both the diagnostic and prognostic tasks are formulated as a particle-filtering problem in order to explicitly represent and manage uncertainties in state estimation and remaining life estimation. Current state-of-the-art prognostic health management (PHM) systems are mostly centralized in nature, where all the processing is reliant on a single processor. This can lead to a loss in functionality in case of a crash of the central processor or monitor. Furthermore, with increases in the volume of sensor data as well as the complexity of algorithms, traditional centralized systems become for a number of reasons somewhat ungainly for successful deployment, and efficient distributed architectures can be more beneficial. The distributed health management architecture is comprised of a network of smart sensor devices. These devices monitor the health of various subsystems or modules. They perform diagnostics operations and trigger prognostics operations based on user-defined thresholds and rules. The sensor devices, called computing elements (CEs), consist of a sensor, or set of sensors, and a communication device (i.e., a wireless transceiver beside an embedded processing element). The CE runs in either a diagnostic or prognostic operating mode. The diagnostic mode is the default mode where a CE monitors a given subsystem or component through a low-weight diagnostic algorithm. If a CE detects a critical condition during monitoring, it raises a flag. Depending on availability of resources, a networked local cluster of CEs is formed that then carries out prognostics and fault mitigation by efficient distribution of the tasks. It should be noted that the CEs are expected not to suspend their previous tasks in the prognostic mode. When the
Design and simulation of parallel and distributed architectures for images processing

International Nuclear Information System (INIS)

Pirson, Alain

1990-01-01

The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr
Preliminary design of an advanced programmable digital filter network for large passive acoustic ASW systems. [Parallel processor

Energy Technology Data Exchange (ETDEWEB)

McWilliams, T.; Widdoes, Jr., L. C.; Wood, L.

1976-09-30

The design of an extremely high performance programmable digital filter of novel architecture, the LLL Programmable Digital Filter, is described. The digital filter is a high-performance multiprocessor having general purpose applicability and high programmability; it is extremely cost effective either in a uniprocessor or a multiprocessor configuration. The architecture and instruction set of the individual processor was optimized with regard to the multiple processor configuration. The optimal structure of a parallel processing system was determined for addressing the specific Navy application centering on the advanced digital filtering of passive acoustic ASW data of the type obtained from the SOSUS net. 148 figures. (RWR)
Efficient Sorting on the Tilera Manycore Architecture

Energy Technology Data Exchange (ETDEWEB)

Morari, Alessandro; Tumeo, Antonino; Villa, Oreste; Secchi, Simone; Valero, Mateo

2012-10-24

e present an efficient implementation of the radix sort algo- rithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is com- posed of 64 tiles interconnected through multiple fast Networks- on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor’s sustained performance. We discuss the overall throughput reached by our radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implemen- tations on x86 processors and graphic processing units.
SAPIENS: Spreading Activation Processor for Information Encoded in Network Structures. Technical Report No. 296.

Science.gov (United States)

Ortony, Andrew; Radin, Dean I.

The product of researchers' efforts to develop a computer processor which distinguishes between relevant and irrelevant information in the database, Spreading Activation Processor for Information Encoded in Network Structures (SAPIENS) exhibits (1) context sensitivity, (2) efficiency, (3) decreasing activation over time, (4) summation of…
Modular architectures for quantum networks

Science.gov (United States)

Pirker, A.; Wallnöfer, J.; Dür, W.

2018-05-01

We consider the problem of generating multipartite entangled states in a quantum network upon request. We follow a top-down approach, where the required entanglement is initially present in the network in form of network states shared between network devices, and then manipulated in such a way that the desired target state is generated. This minimizes generation times, and allows for network structures that are in principle independent of physical links. We present a modular and flexible architecture, where a multi-layer network consists of devices of varying complexity, including quantum network routers, switches and clients, that share certain resource states. We concentrate on the generation of graph states among clients, which are resources for numerous distributed quantum tasks. We assume minimal functionality for clients, i.e. they do not participate in the complex and distributed generation process of the target state. We present architectures based on shared multipartite entangled Greenberger–Horne–Zeilinger states of different size, and fully connected decorated graph states, respectively. We compare the features of these architectures to an approach that is based on bipartite entanglement, and identify advantages of the multipartite approach in terms of memory requirements and complexity of state manipulation. The architectures can handle parallel requests, and are designed in such a way that the network state can be dynamically extended if new clients or devices join the network. For generation or dynamical extension of the network states, we propose a quantum network configuration protocol, where entanglement purification is used to establish high fidelity states. The latter also allows one to show that the entanglement generated among clients is private, i.e. the network is secure.
Median and Morphological Specialized Processors for a Real-Time Image Data Processing

Directory of Open Access Journals (Sweden)

Kazimierz Wiatr

2002-01-01

Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the authorÃ¢Â€Â²s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.

Parallel computation for distributed parameter system-from vector processors to Adena computer

Energy Technology Data Exchange (ETDEWEB)

Nogi, T

1983-04-01

Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
Self-Organizing Maps on the Cell Broadband Engine Architecture

International Nuclear Information System (INIS)

McConnell, Sabine M

2010-01-01

We present and evaluate novel parallel implementations of Self-Organizing Maps for the Cell Broadband Engine Architecture. Motivated by the interactive nature of the data-mining process, we evaluate the scalability of the implementations on two clusters using different network characteristics and incarnations (PS3 TM console and PowerXCell 8i) of the architecture. Our implementations use varying combinations of the Power Processing Elements (PPEs) and Synergistic Processing Elements (SPEs) found in the Cell architecture. For a single processor, our implementation scaled well with the number of SPEs regardless of the incarnation. When combining multiple PS3 TM consoles, the synchronization over the slower network resulted in poor speedups and demonstrated that the use of such a low-cost cluster may be severely restricted, even without the use of SPEs. When using multiple SPEs for the PowerXCell 8i cluster, the speedup grew linearly with increasing number of SPEs for a given number of processors, and linear up to a maximum with the number of processors for a given number of SPEs. Our implementation achieved a worst-case efficiency of 67% for the maximum number of processing elements involved in the computation, but consistently higher values for smaller numbers of processing elements with speedups of up to 70.
Hardware trigger processor for the MDT system

CERN Document Server

AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

2017-01-01

We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.
Design Principles for Synthesizable Processor Cores

DEFF Research Database (Denmark)

Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

2012-01-01

As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...
Fast Optimal Replica Placement with Exhaustive Search Using Dynamically Reconfigurable Processor

Directory of Open Access Journals (Sweden)

Hidetoshi Takeshita

2011-01-01

Full Text Available This paper proposes a new replica placement algorithm that expands the exhaustive search limit with reasonable calculation time. It combines a new type of parallel data-flow processor with an architecture tuned for fast calculation. The replica placement problem is to find a replica-server set satisfying service constraints in a content delivery network (CDN. It is derived from the set cover problem which is known to be NP-hard. It is impractical to use exhaustive search to obtain optimal replica placement in large-scale networks, because calculation time increases with the number of combinations. To reduce calculation time, heuristic algorithms have been proposed, but it is known that no heuristic algorithm is assured of finding the optimal solution. The proposed algorithm suits parallel processing and pipeline execution and is implemented on DAPDNA-2, a dynamically reconfigurable processor. Experiments show that the proposed algorithm expands the exhaustive search limit by the factor of 18.8 compared to the conventional algorithm search limit running on a Neumann-type processor.
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

Science.gov (United States)

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Novel memory architecture for video signal processor

Science.gov (United States)

Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei

1993-11-01

An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.
An Architectural Modelfor Intelligent Network Management

Institute of Scientific and Technical Information of China (English)

罗军舟; 顾冠群; 费翔

2000-01-01

Traditional network management approach involves the management of each vendor's equipment and network segment in isolation through its own proprietary element management system. It is necessary to set up a new network management architecture that calls for operation consolidation across vendor and technology boundaries. In this paper, an architectural model for Intelligent Network Management (INM) is presented. The INM system includes a manager system, which controls all subsystems and coordinates different management tasks; an expert system, which is responsible for handling particularly difficult problems, and intelligent agents, which bring the management closer to applications and user requirements by spreading intelligent agents through network segments or domain. In the expert system model proposed, especially an intelligent fault management system is given.The architectural model is to build the INM system to meet the need of managing modern network systems.
Home networking architecture for IPv6

OpenAIRE

Arkko, Jari; Weil, Jason; Troan, Ole; Brandt, Anders

2012-01-01

This text describes evolving networking technology within increasingly large residential home networks. The goal of this document is to define an architecture for IPv6-based home networking while describing the associated principles, considerations and requirements. The text briefly highlights the specific implications of the introduction of IPv6 for home networking, discusses the elements of the architecture, and suggests how standard IPv6 mechanisms and addressing can be employed in home ne...
Mobile networks architecture

CERN Document Server

Perez, Andre

2013-01-01

This book explains the evolutions of architecture for mobiles and summarizes the different technologies:- 2G: the GSM (Global System for Mobile) network, the GPRS (General Packet Radio Service) network and the EDGE (Enhanced Data for Global Evolution) evolution;- 3G: the UMTS (Universal Mobile Telecommunications System) network and the HSPA (High Speed Packet Access) evolutions:- HSDPA (High Speed Downlink Packet Access),- HSUPA (High Speed Uplink Packet Access),- HSPA+;- 4G: the EPS (Evolved Packet System) network.The telephone service and data transmission are the
Data center networks and network architecture

Science.gov (United States)

Esaki, Hiroshi

2014-02-01

This paper discusses and proposes the architectural framework, which is for data center networks. The data center networks require new technical challenges, and it would be good opportunity to change the functions, which are not need in current and future networks. Based on the observation and consideration on data center networks, this paper proposes; (i) Broadcast-free layer 2 network (i.e., emulation of broadcast at the end-node), (ii) Full-mesh point-to-point pipes, and (iii) IRIDES (Invitation Routing aDvertisement for path Engineering System).
Quantifying loopy network architectures.

Directory of Open Access Journals (Sweden)

Eleni Katifori

Full Text Available Biology presents many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture containing closed loops at many different levels. Although a number of approaches have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework, the hierarchical loop decomposition, that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated graphs, such as artificial models and optimal distribution networks, as well as natural graphs extracted from digitized images of dicotyledonous leaves and vasculature of rat cerebral neocortex. We calculate various metrics based on the asymmetry, the cumulative size distribution and the Strahler bifurcation ratios of the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information (exact location of edges and nodes from the metric topology (connectivity and edge weight and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.
Reconfigurable signal processor designs for advanced digital array radar systems

Science.gov (United States)

Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

2017-05-01

The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
Time-Predictable Computer Architecture

Directory of Open Access Journals (Sweden)

Schoeberl Martin

2009-01-01

Full Text Available Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET. Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.
Architectures of electro-optical packet switched networks

DEFF Research Database (Denmark)

Berger, Michael Stubert

2004-01-01

and examines possible architectures for future high capacity networks with high capacity nodes. It is assumed that optics will play a key role in this scenario, and in this respect, the European IST research project DAVID aimed at proposing viable architectures for optical packet switching, exploiting the best...... from optics and electronics. An overview of the DAVID network architecture is given, focusing on the MAN and WAN architecture as well as the MPLS based network hierarchy. A statistical model of the optical slot generation process is presented and utilised to evaluate delay vs. efficiency. Furthermore...... architecture for a buffered crossbar switch is presented. The architecture uses two levels of backpressure (flow control) with different constraints on round trip time. No additional scheduling complexity is introduced, and for the actual example shown, a reduction in memory of 75% was obtained at the cost...
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

Science.gov (United States)

Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

2013-03-01

With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
Discussion paper for a highly parallel array processor-based machine

International Nuclear Information System (INIS)

Hagstrom, R.; Bolotin, G.; Dawson, J.

1984-01-01

The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors
Lipsi: Probably the Smallest Processor in the World

DEFF Research Database (Denmark)

Schoeberl, Martin

2018-01-01

While research on high-performance processors is important, it is also interesting to explore processor architectures at the other end of the spectrum: tiny processor cores for auxiliary functions. While it is common to implement small circuits for such functions, such as a serial port, in dedica...... at a minimal cost....
A research on the application of software defined networking in satellite network architecture

Science.gov (United States)

Song, Huan; Chen, Jinqiang; Cao, Suzhi; Cui, Dandan; Li, Tong; Su, Yuxing

2017-10-01

Software defined network is a new type of network architecture, which decouples control plane and data plane of traditional network, has the feature of flexible configurations and is a direction of the next generation terrestrial Internet development. Satellite network is an important part of the space-ground integrated information network, while the traditional satellite network has the disadvantages of difficult network topology maintenance and slow configuration. The application of SDN technology in satellite network can solve these problems that traditional satellite network faces. At present, the research on the application of SDN technology in satellite network is still in the stage of preliminary study. In this paper, we start with introducing the SDN technology and satellite network architecture. Then we mainly introduce software defined satellite network architecture, as well as the comparison of different software defined satellite network architecture and satellite network virtualization. Finally, the present research status and development trend of SDN technology in satellite network are analyzed.
Stable architectures for deep neural networks

Science.gov (United States)

Haber, Eldad; Ruthotto, Lars

2018-01-01

Deep neural networks have become invaluable tools for supervised machine learning, e.g. classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Critical issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper, we propose new forward propagation techniques inspired by systems of ordinary differential equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.

Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors.

Science.gov (United States)

Hines, Michael L; Eichner, Hubert; Schürmann, Felix

2008-08-01

Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.
Hybrid architecture for building secure sensor networks

Science.gov (United States)

Owens, Ken R., Jr.; Watkins, Steve E.

2012-04-01

Sensor networks have various communication and security architectural concerns. Three approaches are defined to address these concerns for sensor networks. The first area is the utilization of new computing architectures that leverage embedded virtualization software on the sensor. Deploying a small, embedded virtualization operating system on the sensor nodes that is designed to communicate to low-cost cloud computing infrastructure in the network is the foundation to delivering low-cost, secure sensor networks. The second area focuses on securing the sensor. Sensor security components include developing an identification scheme, and leveraging authentication algorithms and protocols that address security assurance within the physical, communication network, and application layers. This function will primarily be accomplished through encrypting the communication channel and integrating sensor network firewall and intrusion detection/prevention components to the sensor network architecture. Hence, sensor networks will be able to maintain high levels of security. The third area addresses the real-time and high priority nature of the data that sensor networks collect. This function requires that a quality-of-service (QoS) definition and algorithm be developed for delivering the right data at the right time. A hybrid architecture is proposed that combines software and hardware features to handle network traffic with diverse QoS requirements.
Functional Verification of Enhanced RISC Processor

OpenAIRE

SHANKER NILANGI; SOWMYA L

2013-01-01

This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...
Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors

Directory of Open Access Journals (Sweden)

Yahya Jan

2012-01-01

Full Text Available This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.
Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors

OpenAIRE

Jiang, Hayang; Xie, Gaogang; Salamatian, Kavé; Mathy, Laurent

2013-01-01

Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. ...
Homogeneous and Heterogeneous MPSoC Architectures with Network-On-Chip Connectivity for Low-Power and Real-Time Multimedia Signal Processing

Directory of Open Access Journals (Sweden)

Sergio Saponara

2012-01-01

Full Text Available Two multiprocessor system-on-chip (MPSoC architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices.
Invasive tightly coupled processor arrays

CERN Document Server

LARI, VAHID

2016-01-01

This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...
Future Network Architectures

DEFF Research Database (Denmark)

Wessing, Henrik; Bozorgebrahimi, Kurosh; Belter, Bartosz

2015-01-01

This study identifies key requirements for NRENs towards future network architectures that become apparent as users become more mobile and have increased expectations in terms of availability of data. In addition, cost saving requirements call for federated use of, in particular, the optical...
An energy-efficient high-performance processor with reconfigurable data-paths using RSFQ circuits

International Nuclear Information System (INIS)

Takagi, Naofumi

2013-01-01

Highlights: ► An idea of a high-performance computer using RSFQ circuits is shown. ► An outline of processor with reconfigurable data-paths (RDPs) is shown. ► Architectural details of an SFQ-RDP are described. -- Abstract: We show recent progress in our research on an energy-efficient high-performance processor with reconfigurable data-paths (RDPs) using rapid single-flux-quantum (RSFQ) circuits. We mainly describe the architectural details of an RDP implemented using RSFQ circuits. An RDP consists of a lot of floating-point units (FPUs) and operand routing networks (ORNs) which connect the FPUs. We reconfigure the RDP to fit a computation, i.e., a group of floating-point operations, appearing in a ‘for’ loop of programs for numerical computations by setting the route in ORNs before the execution of the loop. In the RDP, a lot of FPUs work in parallel with pipelined fashion, and hence, very high-performance computation is achieved
Satellite ATM Networks: Architectures and Guidelines Developed

Science.gov (United States)

vonDeak, Thomas C.; Yegendu, Ferit

1999-01-01

An important element of satellite-supported asynchronous transfer mode (ATM) networking will involve support for the routing and rerouting of active connections. Work published under the auspices of the Telecommunications Industry Association (http://www.tiaonline.org), describes basic architectures and routing protocol issues for satellite ATM (SATATM) networks. The architectures and issues identified will serve as a basis for further development of technical specifications for these SATATM networks. Three ATM network architectures for bent pipe satellites and three ATM network architectures for satellites with onboard ATM switches were developed. The architectures differ from one another in terms of required level of mobility, supported data rates, supported terrestrial interfaces, and onboard processing and switching requirements. The documentation addresses low-, middle-, and geosynchronous-Earth-orbit satellite configurations. The satellite environment may require real-time routing to support the mobility of end devices and nodes of the ATM network itself. This requires the network to be able to reroute active circuits in real time. In addition to supporting mobility, rerouting can also be used to (1) optimize network routing, (2) respond to changing quality-of-service requirements, and (3) provide a fault tolerance mechanism. Traffic management and control functions are necessary in ATM to ensure that the quality-of-service requirements associated with each connection are not violated and also to provide flow and congestion control functions. Functions related to traffic management were identified and described. Most of these traffic management functions will be supported by on-ground ATM switches, but in a hybrid terrestrial-satellite ATM network, some of the traffic management functions may have to be supported by the onboard satellite ATM switch. Future work is planned to examine the tradeoffs of placing traffic management functions onboard a satellite as
Investigating the effectiveness of many-core network processors for high performance cyber protection systems. Part I, FY2011.

Energy Technology Data Exchange (ETDEWEB)

Wheeler, Kyle Bruce; Naegle, John Hunt; Wright, Brian J.; Benner, Robert E., Jr.; Shelburg, Jeffrey Scott; Pearson, David Benjamin; Johnson, Joshua Alan; Onunkwo, Uzoma A.; Zage, David John; Patel, Jay S.

2011-09-01

This report documents our first year efforts to address the use of many-core processors for high performance cyber protection. As the demands grow for higher bandwidth (beyond 1 Gbits/sec) on network connections, the need to provide faster and more efficient solution to cyber security grows. Fortunately, in recent years, the development of many-core network processors have seen increased interest. Prior working experiences with many-core processors have led us to investigate its effectiveness for cyber protection tools, with particular emphasis on high performance firewalls. Although advanced algorithms for smarter cyber protection of high-speed network traffic are being developed, these advanced analysis techniques require significantly more computational capabilities than static techniques. Moreover, many locations where cyber protections are deployed have limited power, space and cooling resources. This makes the use of traditionally large computing systems impractical for the front-end systems that process large network streams; hence, the drive for this study which could potentially yield a highly reconfigurable and rapidly scalable solution.
Power estimation on functional level for programmable processors

Directory of Open Access Journals (Sweden)

M. Schneider

2004-01-01

Full Text Available In diesem Beitrag werden verschiedene Ansätze zur Verlustleistungsschätzung von programmierbaren Prozessoren vorgestellt und bezüglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA. Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Bl¨ocke wird parameterabhängig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schätzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte Parallelitätsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten Schätzwerte für die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler Schätzfehler von 3%. In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA. This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated
Power estimation on functional level for programmable processors

Science.gov (United States)

Schneider, M.; Blume, H.; Noll, T. G.

2004-05-01

In diesem Beitrag werden verschiedene Ansätze zur Verlustleistungsschätzung von programmierbaren Prozessoren vorgestellt und bezüglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW)-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA). Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Bl¨ocke wird parameterabhängig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schätzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte Parallelitätsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten Schätzwerte für die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler Schätzfehler von 3%. In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW) -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA). This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated the input
Token-Aware Completion Functions for Elastic Processor Verification

Directory of Open Access Journals (Sweden)

Sudarshan K. Srinivasan

2009-01-01

Full Text Available We develop a formal verification procedure to check that elastic pipelined processor designs correctly implement their instruction set architecture (ISA specifications. The notion of correctness we use is based on refinement. Refinement proofs are based on refinement maps, which—in the context of this problem—are functions that map elastic processor states to states of the ISA specification model. Data flow in elastic architectures is complicated by the insertion of any number of buffers in any place in the design, making it hard to construct refinement maps for elastic systems in a systematic manner. We introduce token-aware completion functions, which incorporate a mechanism to track the flow of data in elastic pipelines, as a highly automated and systematic approach to construct refinement maps. We demonstrate the efficiency of the overall verification procedure based on token-aware completion functions using six elastic pipelined processor models based on the DLX architecture.
Speeding up the MATLAB complex networks package using graphic processors

International Nuclear Information System (INIS)

Zhang Bai-Da; Wu Jun-Jie; Li Xin; Tang Yu-Hua

2011-01-01

The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3×. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. (interdisciplinary physics and related areas of science and technology)
Intrinsic and task-evoked network architectures of the human brain

Science.gov (United States)

Cole, Michael W.; Bassett, Danielle S.; Power, Jonathan D.; Braver, Todd S.; Petersen, Steven E.

2014-01-01

Summary Many functional network properties of the human brain have been identified during rest and task states, yet it remains unclear how the two relate. We identified a whole-brain network architecture present across dozens of task states that was highly similar to the resting-state network architecture. The most frequent functional connectivity strengths across tasks closely matched the strengths observed at rest, suggesting this is an “intrinsic”, standard architecture of functional brain organization. Further, a set of small but consistent changes common across tasks suggests the existence of a task-general network architecture distinguishing task states from rest. These results indicate the brain’s functional network architecture during task performance is shaped primarily by an intrinsic network architecture that is also present during rest, and secondarily by evoked task-general and task-specific network changes. This establishes a strong relationship between resting-state functional connectivity and task-evoked functional connectivity – areas of neuroscientific inquiry typically considered separately. PMID:24991964
The functional consequences of mutualistic network architecture.

Directory of Open Access Journals (Sweden)

José M Gómez

Full Text Available The architecture and properties of many complex networks play a significant role in the functioning of the systems they describe. Recently, complex network theory has been applied to ecological entities, like food webs or mutualistic plant-animal interactions. Unfortunately, we still lack an accurate view of the relationship between the architecture and functioning of ecological networks. In this study we explore this link by building individual-based pollination networks from eight Erysimum mediohispanicum (Brassicaceae populations. In these individual-based networks, each individual plant in a population was considered a node, and was connected by means of undirected links to conspecifics sharing pollinators. The architecture of these unipartite networks was described by means of nestedness, connectivity and transitivity. Network functioning was estimated by quantifying the performance of the population described by each network as the number of per-capita juvenile plants produced per population. We found a consistent relationship between the topology of the networks and their functioning, since variation across populations in the average per-capita production of juvenile plants was positively and significantly related with network nestedness, connectivity and clustering. Subtle changes in the composition of diverse pollinator assemblages can drive major consequences for plant population performance and local persistence through modifications in the structure of the inter-plant pollination networks.
Tenet: An Architecture for Tiered Embedded Networks

OpenAIRE

Ramesh Govindan; Eddie Kohler; Deborah Estrin; Fang Bian; Krishna Chintalapudi; Om Gnawali; Sumit Rangwala; Ramakrishna Gummadi; Thanos Stathopoulos

2005-01-01

Future large-scale sensor network deployments will be tiered, with the motes providing dense sensing and a higher tier of 32-bit master nodes with more powerful radios providing increased overall network capacity. In this paper, we describe a functional architecture for wireless sensor networks that leverages this structure to simplify the overall system. Our Tenet architecture has the nice property that the mote-layer software is generic and reusable, and all application functionality reside...
Description and Simulation of a Fast Packet Switch Architecture for Communication Satellites

Science.gov (United States)

Quintana, Jorge A.; Lizanich, Paul J.

1995-01-01

The NASA Lewis Research Center has been developing the architecture for a multichannel communications signal processing satellite (MCSPS) as part of a flexible, low-cost meshed-VSAT (very small aperture terminal) network. The MCSPS architecture is based on a multifrequency, time-division-multiple-access (MF-TDMA) uplink and a time-division multiplex (TDM) downlink. There are eight uplink MF-TDMA beams, and eight downlink TDM beams, with eight downlink dwells per beam. The information-switching processor, which decodes, stores, and transmits each packet of user data to the appropriate downlink dwell onboard the satellite, has been fully described by using VHSIC (Very High Speed Integrated-Circuit) Hardware Description Language (VHDL). This VHDL code, which was developed in-house to simulate the information switching processor, showed that the architecture is both feasible and viable. This paper describes a shared-memory-per-beam architecture, its VHDL implementation, and the simulation efforts.
Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

Science.gov (United States)

2013-09-30

1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the

An Evolutionary Optimization Framework for Neural Networks and Neuromorphic Architectures

Energy Technology Data Exchange (ETDEWEB)

Schuman, Catherine D [ORNL; Plank, James [University of Tennessee (UT); Disney, Adam [University of Tennessee (UT); Reynolds, John [University of Tennessee (UT)

2016-01-01

As new neural network and neuromorphic architectures are being developed, new training methods that operate within the constraints of the new architectures are required. Evolutionary optimization (EO) is a convenient training method for new architectures. In this work, we review a spiking neural network architecture and a neuromorphic architecture, and we describe an EO training framework for these architectures. We present the results of this training framework on four classification data sets and compare those results to other neural network and neuromorphic implementations. We also discuss how this EO framework may be extended to other architectures.
Design and evaluation of an architecture for a digital signal processor for instrumentation applications

Science.gov (United States)

Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

1990-03-01

The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
Heterogeneous reconfigurable processors for real-time baseband processing from algorithm to architecture

CERN Document Server

Zhang, Chenxin; Öwall, Viktor

2016-01-01

This book focuses on domain-specific heterogeneous reconfigurable architectures, demonstrating for readers a computing platform which is flexible enough to support multiple standards, multiple modes, and multiple algorithms. The content is multi-disciplinary, covering areas of wireless communication, computing architecture, and circuit design. The platform described provides real-time processing capability with reasonable implementation cost, achieving balanced trade-offs among flexibility, performance, and hardware costs. The authors discuss efficient design methods for wireless communication processing platforms, from both an algorithm and architecture design perspective. Coverage also includes computing platforms for different wireless technologies and standards, including MIMO, OFDM, Massive MIMO, DVB, WLAN, LTE/LTE-A, and 5G. •Discusses reconfigurable architectures, including hardware building blocks such as processing elements, memory sub-systems, Network-on-Chip (NoC), and dynamic hardware reconfigur...
Hardware Synchronization for Embedded Multi-Core Processors

DEFF Research Database (Denmark)

Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

2011-01-01

Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...
Many - body simulations using an array processor

International Nuclear Information System (INIS)

Rapaport, D.C.

1985-01-01

Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate
Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

Science.gov (United States)

Chen, Paul Peichuan

1993-01-01

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.
Support for Programming Models in Network-on-Chip-based Many-core Systems

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth

This thesis addresses aspects of support for programming models in Network-on- Chip-based many-core architectures. The main focus is to consider architectural support for a plethora of programming models in a single system. The thesis has three main parts. The first part considers parallelization...... models to be supported by a single architecture. The architecture features a specialized network interface processor which allows extensive configurability of the memory system. Based on this architecture, a detailed implementation of the cache coherent shared memory programming model is presented...
A high-accuracy optical linear algebra processor for finite element applications

Science.gov (United States)

Casasent, D.; Taylor, B. K.

1984-01-01

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
Rio: a dynamic self-healing services architecture using Jini networking technology

Science.gov (United States)

Clarke, James B.

2002-06-01

Current mainstream distributed Java architectures offer great capabilities embracing conventional enterprise architecture patterns and designs. These traditional systems provide robust transaction oriented environments that are in large part focused on data and host processors. Typically, these implementations require that an entire application be deployed on every machine that will be used as a compute resource. In order for this to happen, the application is usually taken down, installed and started with all systems in-sync and knowing about each other. Static environments such as these present an extremely difficult environment to setup, deploy and administer.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.

Science.gov (United States)

Zhang, Zhen; Ma, Cheng; Zhu, Rong

2017-08-23

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System

Directory of Open Access Journals (Sweden)

Zhen Zhang

2017-08-01

Full Text Available Artificial Neural Networks (ANNs, including Deep Neural Networks (DNNs, have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP. The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
Design of an Elliptic Curve Cryptography processor for RFID tag chips.

Science.gov (United States)

Liu, Zilong; Liu, Dongsheng; Zou, Xuecheng; Lin, Hui; Cheng, Jian

2014-09-26

Radio Frequency Identification (RFID) is an important technique for wireless sensor networks and the Internet of Things. Recently, considerable research has been performed in the combination of public key cryptography and RFID. In this paper, an efficient architecture of Elliptic Curve Cryptography (ECC) Processor for RFID tag chip is presented. We adopt a new inversion algorithm which requires fewer registers to store variables than the traditional schemes. A new method for coordinate swapping is proposed, which can reduce the complexity of the controller and shorten the time of iterative calculation effectively. A modified circular shift register architecture is presented in this paper, which is an effective way to reduce the area of register files. Clock gating and asynchronous counter are exploited to reduce the power consumption. The simulation and synthesis results show that the time needed for one elliptic curve scalar point multiplication over GF(2163) is 176.7 K clock cycles and the gate area is 13.8 K with UMC 0.13 μm Complementary Metal Oxide Semiconductor (CMOS) technology. Moreover, the low power and low cost consumption make the Elliptic Curve Cryptography Processor (ECP) a prospective candidate for application in the RFID tag chip.
An architecture for human-network interfaces

DEFF Research Database (Denmark)

Sonnenwald, Diane H.

1990-01-01

Some of the issues (and their consequences) that arise when human-network interfaces (HNIs) are viewed from the perspective of people who use and develop them are examined. Target attributes of HNI architecture are presented. A high-level architecture model that supports the attributes is discussed...
Accuracy Limitations in Optical Linear Algebra Processors

Science.gov (United States)

Batsell, Stephen Gordon

1990-01-01

One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.
Security Shift in Future Network Architectures

NARCIS (Netherlands)

Hartog, T.; Schotanus, H.A.; Verkoelen, C.A.A.

2010-01-01

In current practice military communication infrastructures are deployed as stand-alone networked information systems. Network-Enabled Capabilities (NEC) and combined military operations lead to new requirements which current communication architectures cannot deliver. This paper informs IT
Network Analysis, Architecture, and Design

CERN Document Server

McCabe, James D

2007-01-01

Traditionally, networking has had little or no basis in analysis or architectural development, with designers relying on technologies they are most familiar with or being influenced by vendors or consultants. However, the landscape of networking has changed so that network services have now become one of the most important factors to the success of many third generation networks. It has become an important feature of the designer's job to define the problems that exist in his network, choose and analyze several optimization parameters during the analysis process, and then prioritize and evalua
Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

International Nuclear Information System (INIS)

Longoni, G.

2010-01-01

The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)
Mobile network architecture of the long-range WindScanner system

OpenAIRE

Vasiljevic, Nikola; Lea, Guillaume; Hansen, Per; Jensen, Henrik M.

2016-01-01

In this report we have presented the network architecture of the long-range WindScanner system that allows utilization of mobile network connections without the use of static public IP addresses. The architecture mitigates the issues of additional fees and contractual obligations that are linked to the acquisition of the mobile network connections with static public IP addresses. The architecture consists of a hardware VPN solution based on the network appliances Z1 and MX60 from Cisco Meraki...
Optical Array Processor: Laboratory Results

Science.gov (United States)

Casasent, David; Jackson, James; Vaerewyck, Gerard

1987-01-01

A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.
Code compression for VLIW embedded processors

Science.gov (United States)

Piccinelli, Emiliano; Sannino, Roberto

2004-04-01

The implementation of processors for embedded systems implies various issues: main constraints are cost, power dissipation and die area. On the other side, new terminals perform functions that require more computational flexibility and effort. Long code streams must be loaded into memories, which are expensive and power consuming, to run on DSPs or CPUs. To overcome this issue, the "SlimCode" proprietary algorithm presented in this paper (patent pending technology) can reduce the dimensions of the program memory. It can run offline and work directly on the binary code the compiler generates, by compressing it and creating a new binary file, about 40% smaller than the original one, to be loaded into the program memory of the processor. The decompression unit will be a small ASIC, placed between the Memory Controller and the System bus of the processor, keeping unchanged the internal CPU architecture: this implies that the methodology is completely transparent to the core. We present comparisons versus the state-of-the-art IBM Codepack algorithm, along with its architectural implementation into the ST200 VLIW family core.

Designing Next Generation Massively Multithreaded Architectures for Irregular Applications

Energy Technology Data Exchange (ETDEWEB)

Tumeo, Antonino; Secchi, Simone; Villa, Oreste

2012-08-31

Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multi-threaded architectures with large node count, like the Cray XMT, have been shown to address their requirements better than commodity clusters. In this paper we present the approaches that we are currently pursuing to design future generations of these architectures. First, we introduce the Cray XMT and compare it to other multithreaded architectures. We then propose an evolution of the architecture, integrating multiple cores per node and next generation network interconnect. We advocate the use of hardware support for remote memory reference aggregation to optimize network utilization. For this evaluation we developed a highly parallel, custom simulation infrastructure for multi-threaded systems. Our simulator executes unmodified XMT binaries with very large datasets, capturing effects due to contention and hot-spotting, while predicting execution times with greater than 90% accuracy. We also discuss the FPGA prototyping approach that we are employing to study efficient support for irregular applications in next generation manycore processors.
Optical Neural Network Classifier Architectures

National Research Council Canada - National Science Library

Getbehead, Mark

1998-01-01

We present an adaptive opto-electronic neural network hardware architecture capable of exploiting parallel optics to realize real-time processing and classification of high-dimensional data for Air...
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

Science.gov (United States)

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

Science.gov (United States)

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
DIMACS Workshop on Interconnection Networks and Mapping, and Scheduling Parallel Computations

CERN Document Server

Rosenberg, Arnold L; Sotteau, Dominique; NSF Science and Technology Center in Discrete Mathematics and Theoretical Computer Science; Interconnection networks and mapping and scheduling parallel computations

1995-01-01

The interconnection network is one of the most basic components of a massively parallel computer system. Such systems consist of hundreds or thousands of processors interconnected to work cooperatively on computations. One of the central problems in parallel computing is the task of mapping a collection of processes onto the processors and routing network of a parallel machine. Once this mapping is done, it is critical to schedule computations within and communication among processor from universities and laboratories, as well as practitioners involved in the design, implementation, and application of massively parallel systems. Focusing on interconnection networks of parallel architectures of today and of the near future , the book includes topics such as network topologies,network properties, message routing, network embeddings, network emulation, mappings, and efficient scheduling. inputs for a process are available where and when the process is scheduled to be computed. This book contains the refereed pro...
Joint Hybrid Backhaul and Access Links Design in Cloud-Radio Access Networks

KAUST Repository

Dhifallah, Oussama Najeeb; Dahrouj, Hayssam; Al-Naffouri, Tareq Y.; Alouini, Mohamed-Slim

2015-01-01

The cloud-radio access network (CRAN) is expected to be the core network architecture for next generation mobile radio systems. In this paper, we consider the downlink of a CRAN formed of one central processor (the cloud) and several base station
A Scalable Architecture for VoIP Conferencing

Directory of Open Access Journals (Sweden)

R Venkatesha Prasad

2003-10-01

Full Text Available Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called "Loudness Number" (LN. The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP. A working prototype based on the proposed architecture is already functional.
Network interconnections: an architectural reference model

NARCIS (Netherlands)

Butscher, B.; Lenzini, L.; Morling, R.; Vissers, C.A.; Popescu-Zeletin, R.; van Sinderen, Marten J.; Heger, D.; Krueger, G.; Spaniol, O.; Zorn, W.

1985-01-01

One of the major problems in understanding the different approaches in interconnecting networks of different technologies is the lack of reference to a general model. The paper develops the rationales for a reference model of network interconnection and focuses on the architectural implications for
Software Defined Networking (SDN) controlled all optical switching networks with multi-dimensional switching architecture

Science.gov (United States)

Zhao, Yongli; Ji, Yuefeng; Zhang, Jie; Li, Hui; Xiong, Qianjin; Qiu, Shaofeng

2014-08-01

Ultrahigh throughout capacity requirement is challenging the current optical switching nodes with the fast development of data center networks. Pbit/s level all optical switching networks need to be deployed soon, which will cause the high complexity of node architecture. How to control the future network and node equipment together will become a new problem. An enhanced Software Defined Networking (eSDN) control architecture is proposed in the paper, which consists of Provider NOX (P-NOX) and Node NOX (N-NOX). With the cooperation of P-NOX and N-NOX, the flexible control of the entire network can be achieved. All optical switching network testbed has been experimentally demonstrated with efficient control of enhanced Software Defined Networking (eSDN). Pbit/s level all optical switching nodes in the testbed are implemented based on multi-dimensional switching architecture, i.e. multi-level and multi-planar. Due to the space and cost limitation, each optical switching node is only equipped with four input line boxes and four output line boxes respectively. Experimental results are given to verify the performance of our proposed control and switching architecture.
Dual-scale topology optoelectronic processor.

Science.gov (United States)

Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

1991-12-15

The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
The Hi-Ring architecture for datacentre networks

DEFF Research Database (Denmark)

Galili, Michael; Kamchevska, Valerija; Ding, Yunhong

2016-01-01

This paper summarizes recent work on a hierarchical ring-based network architecture (Hi-Ring) for datacentre and short-range applications. The architecture allows leveraging benefits of optical switching technologies while maintaining a high level of connection granularity. We discuss results...
Raexplore: Enabling Rapid, Automated Architecture Exploration for Full Applications

Energy Technology Data Exchange (ETDEWEB)

Zhang, Yao [Argonne National Lab. (ANL), Argonne, IL (United States); Balaprakash, Prasanna [Argonne National Lab. (ANL), Argonne, IL (United States); Meng, Jiayuan [Argonne National Lab. (ANL), Argonne, IL (United States); Morozov, Vitali [Argonne National Lab. (ANL), Argonne, IL (United States); Parker, Scott [Argonne National Lab. (ANL), Argonne, IL (United States); Kumaran, Kalyan [Argonne National Lab. (ANL), Argonne, IL (United States)

2014-12-01

We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list of architectural scaling options for future processor designs.
Computer Architecture A Quantitative Approach

CERN Document Server

Hennessy, John L

2007-01-01

The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis
A Survey of Some Approaches to Distributed Data Base & Distributed File System Architecture.

Science.gov (United States)

1980-01-01

BUS POD A DD A 12 12 A = A Cell D = D Cell Figure 7-1: MUFFIN logical architecture - 45 - MUFI January 1980 ".-.Bus Interface V Conventional Processor...and Applied Mathematics (14), * December, 1966. [Kimbleton 791 Kimbleton, Stephen; Wang, Pearl; and Fong, Elizabeth. XNDM: An Experimental Network
Soft-core dataflow processor architecture optimised for radar signal processing: Article

CSIR Research Space (South Africa)

Broich, R

2014-10-01

Full Text Available Current radar signal processors lack either performance or flexibility. Custom soft-core processors exhibit potential in high-performance signal processing applications, yet remain relatively unexplored in research literature. In this paper, we use...
dSDiVN: a distributed Software-Defined Networking architecture for Infrastructure-less Vehicular Networks

OpenAIRE

Alioua, Ahmed; Senouci, Sidi-Mohammed; Moussaoui, Samira

2017-01-01

In the last few years, the emerging network architecture paradigm of Software-Defined Networking (SDN), has become one of the most important technology to manage large scale networks such as Vehicular Ad-hoc Networks (VANETs). Recently, several works have shown interest in the use of SDN paradigm in VANETs. SDN brings flexibility, scalability and management facility to current VANETs. However, almost all of proposed Software-Defined VANET (SDVN) architectures are infrastructure-based. This pa...
Fast decision algorithms in low-power embedded processors for quality-of-service based connectivity of mobile sensors in heterogeneous wireless sensor networks.

Science.gov (United States)

Jaraíz-Simón, María D; Gómez-Pulido, Juan A; Vega-Rodríguez, Miguel A; Sánchez-Pérez, Juan M

2012-01-01

When a mobile wireless sensor is moving along heterogeneous wireless sensor networks, it can be under the coverage of more than one network many times. In these situations, the Vertical Handoff process can happen, where the mobile sensor decides to change its connection from a network to the best network among the available ones according to their quality of service characteristics. A fitness function is used for the handoff decision, being desirable to minimize it. This is an optimization problem which consists of the adjustment of a set of weights for the quality of service. Solving this problem efficiently is relevant to heterogeneous wireless sensor networks in many advanced applications. Numerous works can be found in the literature dealing with the vertical handoff decision, although they all suffer from the same shortfall: a non-comparable efficiency. Therefore, the aim of this work is twofold: first, to develop a fast decision algorithm that explores the entire space of possible combinations of weights, searching that one that minimizes the fitness function; and second, to design and implement a system on chip architecture based on reconfigurable hardware and embedded processors to achieve several goals necessary for competitive mobile terminals: good performance, low power consumption, low economic cost, and small area integration.
Multiprocessor architecture: Synthesis and evaluation

Science.gov (United States)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Security Shift in Future Network Architectures

OpenAIRE

Hartog, T.; Schotanus, H.A.; Verkoelen, C.A.A.

2010-01-01

In current practice military communication infrastructures are deployed as stand-alone networked information systems. Network-Enabled Capabilities (NEC) and combined military operations lead to new requirements which current communication architectures cannot deliver. This paper informs IT architects, information architects and security specialists about the separation of network and information security, the consequences of this shift and our view on future communication infrastructures in d...

A CNN-Specific Integrated Processor

Directory of Open Access Journals (Sweden)

Suleyman Malki

2009-01-01

Full Text Available Integrated Processors (IP are algorithm-specific cores that either by programming or by configuration can be re-used within many microelectronic systems. This paper looks at Cellular Neural Networks (CNN to become realized as IP. First current digital implementations are reviewed, and the memoryprocessor bandwidth issues are analyzed. Then a generic view is taken on the structure of the network, and a new intra-communication protocol based on rotating wheels is proposed. It is shown that this provides for guaranteed high-performance with a minimal network interface. The resulting node is small and supports multi-level CNN designs, giving the system a 30-fold increase in capacity compared to classical designs. As it facilitates multiple operations on a single image, and single operations on multiple images, with minimal access to the external image memory, balancing the internal and external data transfer requirements optimizes the system operation. In conventional digital CNN designs, the treatment of boundary nodes requires additional logic to handle the CNN value propagation scheme. In the new architecture, only a slight modification of the existing cells is necessary to model the boundary effect. A typical prototype for visual pattern recognition will house 4096 CNN cells with a 2% overhead for making it an IP.
Scaling architecture-on-demand based optical networks

NARCIS (Netherlands)

Meyer, Hugo; Sancho, Jose Carlos; Mrdakovic, Milica; Peng, Shuping; Simeonidou, Dimitra; Miao, Wang; Calabretta, Nicola

2016-01-01

This paper analyzes methodologies that allow scaling properly Architecture-On-Demand (AoD) based optical networks. As Data Centers and HPC systems are growing in size and complexity, optical networks seem to be the way to scale the bandwidth of current network infrastructures. To scale the number of
Atmel's New Rad-Hard Sparc V8 Processor 200Mhz & Low Power System on Chip

Science.gov (United States)

Ganry, Nicolas; Mantelet, Guy; Parkes, Steve; McClements, Chris

2014-08-01

The AT6981 is a new generation of processor designed for critical spaceflight applications, which combines a high-performance SPARC® V8 radiation hard processor, with enough on-chip memory for many aerospace applications and state-of-the-art SpaceWire networking technology from STAR- Dundee. The AT6981 is implemented in Atmel 90nm rad-hard technology, enabling 200 MHz operating speed for the processor with power consumption levels around 1W. This advanced technology allows strong system integration in a SoC with embedded peripherals like CAN, 1553, Ethernet, DDR and embedded memory with 1Mbytes SRAM. The device is ITAR- free and is developed in France by Atmel Aerospace having more than of 30years space experience. This paper describes this new SoC architecture and technical options considered to insure the best performances, the minimum power consumption and high reliability. This device will be available on the market in H2 2014 for evaluation with first flight models targeted end 2015.
Mobile network architecture of the long-range WindScanner system

DEFF Research Database (Denmark)

Vasiljevic, Nikola; Lea, Guillaume; Hansen, Per

to the acquisition of the mobile network connections with static public IP addresses. The architecture consists of a hardware VPN solution based on the network appliances Z1 and MX60 from Cisco Meraki with additional 3G or 4G dongles. With the presented network architecture and appropriate configuration, we fulfill...
Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture

Science.gov (United States)

Klarer, Paul R.; Hayward, David R.; Amai, Wendy A.

2006-10-03

This invention relates to system architectures, specifically failure-tolerant and self-reconfiguring embedded system architectures. The invention provides both a method and architecture for redundancy. There can be redundancy in both software and hardware for multiple levels of redundancy. The invention provides a self-reconfiguring architecture for activating redundant modules whenever other modules fail. The architecture comprises: a communication backbone connected to two or more processors and software modules running on each of the processors. Each software module runs on one processor and resides on one or more of the other processors to be available as a backup module in the event of failure. Each module and backup module reports its status over the communication backbone. If a primary module does not report, its backup module takes over its function. If the primary module becomes available again, the backup module returns to its backup status.
On the efficacy of using the transfer-controlled procedure during periods of STP processor overloads in SS7 networks

Science.gov (United States)

Rumsewicz, Michael

1994-04-01

In this paper, we examine call completion performance, rather than message throughput, in a Common Channel Signaling network in which the processing resources, and not transmission resources, of a Signaling Transfer Point (STP) are overloaded. Specifically, we perform a transient analysis, via simulation, of a network consisting of a single Central Processor-based STP connecting many local exchanges. We consider the efficacy of using the Transfer Controlled (TFC) procedure when the network call attempt rate exceeds the processing capability of the STP. We find the following: (1) the success of the control depends critically on the rate at which TFC's are sent; (2) use of the TFC procedure in theevent of processor overload can provide reasonable call completion rates.
The Fermilab Advanced Computer Program multi-array processor system (ACPMAPS): A site oriented supercomputer for theoretical physics

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.

1988-08-01

The ACP Multi-Array Processor System (ACPMAPS) is a highly cost effective, local memory parallel computer designed for floating point intensive grid based problems. The processing nodes of the system are single board array processors based on the FORTRAN and C programmable Weitek XL chip set. The nodes are connected by a network of very high bandwidth 16 port crossbar switches. The architecture is designed to achieve the highest possible cost effectiveness while maintaining a high level of programmability. The primary application of the machine at Fermilab will be lattice gauge theory. The hardware is supported by a transparent site oriented software system called CANOPY which shields theorist users from the underlying node structure. 4 refs., 2 figs
Software Defined Networks in Wireless Sensor Architectures

Directory of Open Access Journals (Sweden)

Jesús Antonio Puente Fernández

2018-03-01

Full Text Available Nowadays, different protocols coexist in Internet that provides services to users. Unfortunately, control decisions and distributed management make it hard to control networks. These problems result in an inefficient and unpredictable network behaviour. Software Defined Networks (SDN is a new concept of network architecture. It intends to be more flexible and to simplify the management in networks with respect to traditional architectures. Each of these aspects are possible because of the separation of control plane (controller and data plane (switches in network devices. OpenFlow is the most common protocol for SDN networks that provides the communication between control and data planes. Moreover, the advantage of decoupling control and data planes enables a quick evolution of protocols and also its deployment without replacing data plane switches. In this survey, we review the SDN technology and the OpenFlow protocol and their related works. Specifically, we describe some technologies as Wireless Sensor Networks and Wireless Cellular Networks and how SDN can be included within them in order to solve their challenges. We classify different solutions for each technology attending to the problem that is being fixed.
Programmable level-1 trigger with 3D-Flow processor array

International Nuclear Information System (INIS)

Crosetto, D.

1994-01-01

The 3D-Flow parallel processing system is a new concept in processor architecture, system architecture, and assembly architecture. Compared to the electronics used in present systems, this approach reduces the cost and complexity of the hardware and allows easy assembly, disassembly, incremental upgrading, and maintenance of different interconnection topologies. The 3D-Flow parallel-processing system benefits high energy physics (HEP) by allowing: (1) common less costly hardware to be used in different experiments. (2) new uses of existing installations. (3) tuning of trigger based on the first analyzed data, and (4) selection of desired events directly from raw data. The goal of this parallel-processing architecture is to acquire multiple data in parallel (up to 100 million frames per second) and to process them at high speed, accomplishing digital filtering on the input data, pattern recognition (particle identification), data moving, and data formatting. The main features of the system are its programmability, scalability, high-speed communication, and low cost. The compactness of the 3D-Flow parallel-processing system in concert with the processor architecture allows processor interconnections to be mapped into the geometry of sensors (detectors in HEP) without large interconnection signal delay, enabling real-time pattern recognition. The overall 3D-Flow project has passed a major design review at Fermilab (Reviewers included experts in computers, triggering, system assembly, and electronics)
Real time processor for array speckle interferometry

Science.gov (United States)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-02-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Emulation of Neural Networks on a Nanoscale Architecture

International Nuclear Information System (INIS)

Eshaghian-Wilner, Mary M; Friesz, Aaron; Khitun, Alex; Navab, Shiva; Parker, Alice C; Wang, Kang L; Zhou, Chongwu

2007-01-01

In this paper, we propose using a nanoscale spin-wave-based architecture for implementing neural networks. We show that this architecture can efficiently realize highly interconnected neural network models such as the Hopfield model. In our proposed architecture, no point-to-point interconnection is required, so unlike standard VLSI design, no fan-in/fan-out constraint limits the interconnectivity. Using spin-waves, each neuron could broadcast to all other neurons simultaneously and similarly a neuron could concurrently receive and process multiple data. Therefore in this architecture, the total weighted sum to each neuron can be computed by the sum of the values from all the incoming waves to that neuron. In addition, using the superposition property of waves, this computation can be done in O(1) time, and neurons can update their states quite rapidly
Space Mobile Network: A Near Earth Communication and Navigation Architecture

Science.gov (United States)

Israel, Dave J.; Heckler, Greg; Menrad, Robert J.

2016-01-01

This paper describes a Space Mobile Network architecture, the result of a recently completed NASA study exploring architectural concepts to produce a vision for the future Near Earth communications and navigation systems. The Space Mobile Network (SMN) incorporates technologies, such as Disruption Tolerant Networking (DTN) and optical communications, and new operations concepts, such as User Initiated Services, to provide user services analogous to a terrestrial smartphone user. The paper will describe the SMN Architecture, envisioned future operations concepts, opportunities for industry and international collaboration and interoperability, and technology development areas and goals.
Architecture of high reliable control systems using complex software

International Nuclear Information System (INIS)

Tallec, M.

1990-01-01

The problems involved by the use of complex softwares in control systems that must insure a very high level of safety are examined. The first part makes a brief description of the prototype of PROSPER system. PROSPER means protection system for nuclear reactor with high performances. It has been installed on a French nuclear power plant at the beginnning of 1987 and has been continually working since that time. This prototype is realized on a multi-processors system. The processors communicate between themselves using interruptions and protected shared memories. On each processor, one or more protection algorithms are implemented. Those algorithms use data coming directly from the plant and, eventually, data computed by the other protection algorithms. Each processor makes its own acquisitions from the process and sends warning messages if some operating anomaly is detected. All algorithms are activated concurrently on an asynchronous way. The results are presented and the safety related problems are detailed. - The second part is about measurements' validation. First, we describe how the sensors' measurements will be used in a protection system. Then, a proposal for a method based on the techniques of artificial intelligence (expert systems and neural networks) is presented. - The last part is about the problems of architectures of systems including hardware and software: the different types of redundancies used till now and a proposition of a multi-processors architecture which uses an operating system that is able to manage several tasks implemented on different processors, which verifies the good operating of each of those tasks and of the related processors and which allows to carry on the operation of the system, even in a degraded manner when a failure has been detected are detailed [fr
Architectural transformations in network services and distributed systems

CERN Document Server

Luntovskyy, Andriy

2017-01-01

With the given work we decided to help not only the readers but ourselves, as the professionals who actively involved in the networking branch, with understanding the trends that have developed in recent two decades in distributed systems and networks. Important architecture transformations of distributed systems have been examined. The examples of new architectural solutions are discussed. Content Periodization of service development Energy efficiency Architectural transformations in Distributed Systems Clustering and Parallel Computing, performance models Cloud Computing, RAICs, Virtualization, SDN Smart Grid, Internet of Things, Fog Computing Mobile Communication from LTE to 5G, DIDO, SAT-based systems Data Security Guaranteeing Distributed Systems Target Groups Students in EE and IT of universities and (dual) technical high schools Graduated engineers as well as teaching staff About the Authors Andriy Luntovskyy provides classes on networks, mobile communication, software technology, distributed systems, ...
Accelerating molecular dynamic simulation on the cell processor and Playstation 3.

Science.gov (United States)

Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S

2009-01-30

Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.
Poster: A Software-Defined Multi-Camera Network

OpenAIRE

Chen, Po-Yen; Chen, Chien; Selvaraj, Parthiban; Claesen, Luc

2016-01-01

The widespread popularity of OpenFlow leads to a significant increase in the number of applications developed in SoftwareDefined Networking (SDN). In this work, we propose the architecture of a Software-Defined Multi-Camera Network consisting of small, flexible, economic, and programmable cameras which combine the functions of the processor, switch, and camera. A Software-Defined Multi-Camera Network can effectively reduce the overall network bandwidth and reduce a large amount of the Capex a...
Software-defined reconfigurable microwave photonics processor.

Science.gov (United States)

Pérez, Daniel; Gasulla, Ivana; Capmany, José

2015-06-01

We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.
Design and optimizing factors of PACS network architecture

International Nuclear Information System (INIS)

Tao Yonghao; Miao Jingtao

2001-01-01

Objective: Exploring the design and optimizing factors of picture archiving and communication system (PACS) network architecture. Methods: Based on the PACS of shanghai first hospital to performed the measurements and tests on the requirements of network bandwidth and transmitting rate for different PACS functions and procedures respectively in static and dynamic network traffic situation, utilizing the network monitoring tools which built-in workstations and provided by Windows NT. Results: No obvious difference between switch equipment and HUB when measurements and tests implemented in static situation except route which slow down the rate markedly. In dynamic environment Switch is able to provide higher bandwidth utilizing than HUB and local system scope communication achieved faster transmitting rate than global system. Conclusion: The primary optimizing factors of PACS network architecture design include concise network topology and disassemble tremendous global traffic to multiple distributed local scope network communication to reduce the traffic of network backbone. The most important issue is guarantee essential bandwidth for diagnosis procedure of medical imaging
Smart business networks: architectural aspects and risks

NARCIS (Netherlands)

L-F. Pau (Louis-François)

2004-01-01

textabstractThis paper summarizes key attributes and the uniqueness of smart business networks [1], to propose thereafter an operational implementation architecture. It involves, amongst others, the embedding of business logic specific to a network of business partners, inside the communications
Fiber-wireless convergence in next-generation communication networks systems, architectures, and management

CERN Document Server

Chang, Gee-Kung; Ellinas, Georgios

2017-01-01

This book investigates new enabling technologies for Fi-Wi convergence. The editors discuss Fi-Wi technologies at the three major network levels involved in the path towards convergence: system level, network architecture level, and network management level. The main topics will be: a. At system level: Radio over Fiber (digitalized vs. analogic, standardization, E-band and beyond) and 5G wireless technologies; b. Network architecture level: NGPON, WDM-PON, BBU Hotelling, Cloud Radio Access Networks (C-RANs), HetNets. c. Network management level: SDN for convergence, Next-generation Point-of-Presence, Wi-Fi LTE Handover, Cooperative MultiPoint. • Addresses the Fi-Wi convergence issues at three different levels, namely at the system level, network architecture level, and network management level • Provides approaches in communication systems, network architecture, and management that are expected to steer the evolution towards fiber-wireless convergence • Contributions from leading experts in the field of...

Monte Carlo simulations on SIMD computer architectures

International Nuclear Information System (INIS)

Burmester, C.P.; Gronsky, R.; Wille, L.T.

1992-01-01

In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures
Microprocessor architectures RISC, CISC and DSP

CERN Document Server

Heath, Steve

1995-01-01

'Why are there all these different processor architectures and what do they all mean? Which processor will I use? How should I choose it?' Given the task of selecting an architecture or design approach, both engineers and managers require a knowledge of the whole system and an explanation of the design tradeoffs and their effects. This is information that rarely appears in data sheets or user manuals. This book fills that knowledge gap.Section 1 provides a primer and history of the three basic microprocessor architectures. Section 2 describes the ways in which the architectures react with the
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

Science.gov (United States)

Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

2007-04-01

In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
Network topology exploration of mesh-based coarse-grain reconfigurable architectures

NARCIS (Netherlands)

Bansal, N.; Gupta, S.; Dutt, N.D.; Nicolau, A.; Gupta, R.

2004-01-01

Several coarse-grain reconfigurable architectures proposed recently consist of a large number of processing elements (PEs) connected in a mesh-like network topology. We study the effects of three aspects of network topology exploration on the performance of applications on these architectures: (a)
Data Architecture for Sensor Network

Directory of Open Access Journals (Sweden)

Jan Ježek

2012-03-01

Full Text Available Fast development of hardware in recent years leads to the high availability of simple sensing devices at minimal cost. As a consequence, there is many of sensor networks nowadays. These networks can continuously produce a large amount of observed data including the location of measurement. Optimal data architecture for such propose is a challenging issue due to its large scale and spatio-temporal nature. The aim of this paper is to describe data architecture that was used in a particular solution for storage of sensor data. This solution is based on relation data model – concretely PostgreSQL and PostGIS. We will mention out experience from real world projects focused on car monitoring and project targeted on agriculture sensor networks. We will also shortly demonstrate the possibilities of client side API and the potential of other open source libraries that can be used for cartographic visualization (e.g. GeoServer. The main objective is to describe the strength and weakness of usage of relation database system for such propose and to introduce also alternative approaches based on NoSQL concept.
Microsoft Windows 2000 Network Architecture Guide

National Research Council Canada - National Science Library

Bartock, Paul

2000-01-01

The purpose of this guide is to inform the reader about the services that are available in the Microsoft Windows 2000 environment and how to integrate these services into their network architecture...
A Versatile Image Processor For Digital Diagnostic Imaging And Its Application In Computed Radiography

Science.gov (United States)

Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.

1986-06-01

In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
XL-100S microprogrammable processor

International Nuclear Information System (INIS)

Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

1983-01-01

The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module
Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

OpenAIRE

Abdul Kareem PARCHUR; Kuppangari Krishna RAO; Fazal NOORBASHA; Ram Asaray SINGH

2010-01-01

As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalab...
Architecture in the network society

DEFF Research Database (Denmark)

2004-01-01

Under the theme Architecture in the Network Society, participants were invited to focus on the dialog and sharing of knowledge between architects and other disciplines and to reflect on, and propose, new methods in the design process, to enhance and improve the impact of information technology...
Photonics and Fiber Optics Processor Lab

Data.gov (United States)

Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...
Architectures for single-chip image computing

Science.gov (United States)

Gove, Robert J.

1992-04-01

This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
Coordinated Energy Management in Heterogeneous Processors

Directory of Open Access Journals (Sweden)

Indrani Paul

2014-01-01

Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.
Efficient network-matrix architecture for general flow transport inspired by natural pinnate leaves.

Science.gov (United States)

Hu, Liguo; Zhou, Han; Zhu, Hanxing; Fan, Tongxiang; Zhang, Di

2014-11-14

Networks embedded in three dimensional matrices are beneficial to deliver physical flows to the matrices. Leaf architectures, pervasive natural network-matrix architectures, endow leaves with high transpiration rates and low water pressure drops, providing inspiration for efficient network-matrix architectures. In this study, the network-matrix model for general flow transport inspired by natural pinnate leaves is investigated analytically. The results indicate that the optimal network structure inspired by natural pinnate leaves can greatly reduce the maximum potential drop and the total potential drop caused by the flow through the network while maximizing the total flow rate through the matrix. These results can be used to design efficient networks in network-matrix architectures for a variety of practical applications, such as tissue engineering, cell culture, photovoltaic devices and heat transfer.
A Processor-Sharing Scheduling Strategy for NFV Nodes

Directory of Open Access Journals (Sweden)

Giuseppe Faraci

2016-01-01

Full Text Available The introduction of the two paradigms SDN and NFV to “softwarize” the current Internet is making management and resource allocation two key challenges in the evolution towards the Future Internet. In this context, this paper proposes Network-Aware Round Robin (NARR, a processor-sharing strategy, to reduce delays in traversing SDN/NFV nodes. The application of NARR alleviates the job of the Orchestrator by automatically working at the intranode level, dynamically assigning the processor slices to the virtual network functions (VNFs according to the state of the queues associated with the output links of the network interface cards (NICs. An extensive simulation set is presented to show the improvements achieved with respect to two more processor-sharing strategies chosen as reference.
Designing network on-chip architectures in the nanoscale era

CERN Document Server

Flich, Jose

2010-01-01

Going beyond isolated research ideas and design experiences, Designing Network On-Chip Architectures in the Nanoscale Era covers the foundations and design methods of network on-chip (NoC) technology. The contributors draw on their own lessons learned to provide strong practical guidance on various design issues.Exploring the design process of the network, the first part of the book focuses on basic aspects of switch architecture and design, topology selection, and routing implementation. In the second part, contributors discuss their experiences in the industry, offering a roadmap to recent p
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

Science.gov (United States)

Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

2018-02-01

We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
Architecture-Aware Optimization of an HEVC decoder on Asymmetric Multicore Processors

OpenAIRE

Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

2016-01-01

Low-power asymmetric multicore processors (AMPs) attract considerable attention due to their appealing performance-power ratio for energy-constrained environments. However, these processors pose a significant programming challenge due to the integration of cores with different performance capabilities, asking for an asymmetry-aware scheduling solution that carefully distributes the workload. The recent HEVC standard, which offers several high-level parallelization strategies, is an important ...
Communication Network Architectures Based on Ethernet Passive Optical Network for Offshore Wind Power Farms

Directory of Open Access Journals (Sweden)

Mohamed A. Ahmed

2016-03-01

Full Text Available Nowadays, with large-scale offshore wind power farms (WPFs becoming a reality, more efforts are needed to maintain a reliable communication network for WPF monitoring. Deployment topologies, redundancy, and network availability are the main items to enhance the communication reliability between wind turbines (WTs and control centers. Traditional communication networks for monitoring and control (i.e., supervisory control and data acquisition (SCADA systems using switched gigabit Ethernet will not be sufficient for the huge amount of data passing through the network. In this paper, the optical power budget, optical path loss, reliability, and network cost of the proposed Ethernet Passive Optical Network (EPON-based communication network for small-size offshore WPFs have been evaluated for five different network architectures. The proposed network model consists of an optical network unit device (ONU deployed on the WT side for collecting data from different internal networks. All ONUs from different WTs are connected to a central optical line terminal (OLT, placed in the control center. There are no active electronic elements used between the ONUs and the OLT, which reduces the costs and complexity of maintenance and deployment. As fiber access networks without any protection are characterized by poor reliability, three different protection schemes have been configured, explained, and discussed. Considering the cost of network components, the total implementation expense of different architectures with, or without, protection have been calculated and compared. The proposed network model can significantly contribute to the communication network architecture for next generation WPFs.
Security Aspects of an Enterprise-Wide Network Architecture.

Science.gov (United States)

Loew, Robert; Stengel, Ingo; Bleimann, Udo; McDonald, Aidan

1999-01-01

Presents an overview of two projects that concern local area networks and the common point between networks as they relate to network security. Discusses security architectures based on firewall components, packet filters, application gateways, security-management components, an intranet solution, user registration by Web form, and requests for…

System architecture for ubiquitous live video streaming in university network environment

CSIR Research Space (South Africa)

Dludla, AG

2013-09-01

Full Text Available an architecture which supports ubiquitous live streaming for university or campus networks using a modified bluetooth inquiry mechanism with extended ID, integrated end-user device usage and adaptation to heterogeneous networks. Riding on that architecture...
Tree-based server-middleman-client architecture: improving scalability and reliability for voting-based network games in ad hoc wireless networks

Science.gov (United States)

Guo, Y.; Fujinoki, H.

2006-10-01

The concept of a new tree-based architecture for networked multi-player games was proposed by Matuszek to improve scalability in network traffic at the same time to improve reliability. The architecture (we refer it as "Tree-Based Server- Middlemen-Client architecture") will solve the two major problems in ad-hoc wireless networks: frequent link failures and significance in battery power consumption at wireless transceivers by using two new techniques, recursive aggregation of client messages and subscription-based propagation of game state. However, the performance of the TBSMC architecture has never been quantitatively studied. In this paper, the TB-SMC architecture is compared with the client-server architecture using simulation experiments. We developed an event driven simulator to evaluate the performance of the TB-SMC architecture. In the network traffic scalability experiments, the TB-SMC architecture resulted in less than 1/14 of the network traffic load for 200 end users. In the reliability experiments, the TB-SMC architecture improved the number of successfully delivered players' votes by 31.6, 19.0, and 12.4% from the clientserver architecture at high (failure probability of 90%), moderate (50%) and low (10%) failure probability.
16-Bit RISC Processor Design for Convolution Application

OpenAIRE

Anand Nandakumar Shardul

2013-01-01

In this project, we propose a 16-bit non-pipelined RISC processor, which is used for signal processing applications. The processor consists of the blocks, namely, program counter, clock control unit, ALU, IDU and registers. Advantageous architectural modifications have been made in the incremented circuit used in program counter and carry select adder unit of the ALU in the RISC CPU core. Furthermore, a high speed and low power modified modifies multiplier has been designed and introduced in ...
Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests

Science.gov (United States)

Casasent, D.; Jackson, J.

1986-01-01

A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.
OpenCL code generation for low energy wide SIMD architectures with explicit datapath.

NARCIS (Netherlands)

She, D.; He, Y.; Waeijen, L.J.W.; Corporaal, H.; Jeschke, H.; Silvén, O.

2013-01-01

Energy efficiency is one of the most important aspects in designing embedded processors. The use of a wide SIMD processor architecture is a promising approach to build energy-efficient high performance embedded processors. In this paper, we propose a configurable wide SIMD architecture that utilizes
Robust Networking Architecture and Secure Communication Scheme for Heterogeneous Wireless Sensor Networks

Science.gov (United States)

McNeal, McKenzie, III.

2012-01-01

Current networking architectures and communication protocols used for Wireless Sensor Networks (WSNs) have been designed to be energy efficient, low latency, and long network lifetime. One major issue that must be addressed is the security in data communication. Due to the limited capabilities of low cost and small sized sensor nodes, designing…
Architecture for Cognitive Networking within NASAs Future Space Communications Infrastructure

Science.gov (United States)

Clark, Gilbert J., III; Eddy, Wesley M.; Johnson, Sandra K.; Barnes, James; Brooks, David

2016-01-01

Future space mission concepts and designs pose many networking challenges for command, telemetry, and science data applications with diverse end-to-end data delivery needs. For future end-to-end architecture designs, a key challenge is meeting expected application quality of service requirements for multiple simultaneous mission data flows with options to use diverse onboard local data buses, commercial ground networks, and multiple satellite relay constellations in LEO, MEO, GEO, or even deep space relay links. Effectively utilizing a complex network topology requires orchestration and direction that spans the many discrete, individually addressable computer systems, which cause them to act in concert to achieve the overall network goals. The system must be intelligent enough to not only function under nominal conditions, but also adapt to unexpected situations, and reorganize or adapt to perform roles not originally intended for the system or explicitly programmed. This paper describes architecture features of cognitive networking within the future NASA space communications infrastructure, and interacting with the legacy systems and infrastructure in the meantime. The paper begins by discussing the need for increased automation, including inter-system collaboration. This discussion motivates the features of an architecture including cognitive networking for future missions and relays, interoperating with both existing endpoint-based networking models and emerging information-centric models. From this basis, we discuss progress on a proof-of-concept implementation of this architecture as a cognitive networking on-orbit application on the SCaN Testbed attached to the International Space Station.
Rational calculation accuracy in acousto-optical matrix-vector processor

Science.gov (United States)

Oparin, V. V.; Tigin, Dmitry V.

1994-01-01

The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
Towards A New Opportunistic IoT Network Architecture for Wildlife Monitoring System

NARCIS (Netherlands)

Ayele, Eyuel Debebe; Meratnia, Nirvana; Havinga, Paul J.M.

In this paper we introduce an opportunistic dual radio IoT network architecture for wildlife monitoring systems (WMS). Since data processing consumes less energy than transmitting the raw data, the proposed architecture leverages opportunistic mobile networks in a fixed LPWAN IoT network
Convolutional neural network architectures for predicting DNA–protein binding

Science.gov (United States)

Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.

2016-01-01

Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608
Performances of multiprocessor multidisk architectures for continuous media storage

Science.gov (United States)

Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

1996-03-01

Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Scientific programming on massively parallel processor CP-PACS

International Nuclear Information System (INIS)

Boku, Taisuke

1998-01-01

The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
Stability of Ecological Communities and the Architecture of Mutualistic and Trophic Networks

NARCIS (Netherlands)

Thebault, E.M.C.; Fontaine, C.

2010-01-01

Research on the relationship between the architecture of ecological networks and community stability has mainly focused on one type of interaction at a time, making difficult any comparison between different network types. We used a theoretical approach to show that the network architecture favoring
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

Directory of Open Access Journals (Sweden)

Hristov Ivan

2018-01-01

Full Text Available We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP” in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL. The results show 2 times better performance on KNL processor.
Learning, memory, and the role of neural network architecture.

Directory of Open Access Journals (Sweden)

Ann M Hermundstad

2011-06-01

Full Text Available The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems.
Greening radio access networks using distributed base station architectures

DEFF Research Database (Denmark)

Kardaras, Georgios; Soler, José; Dittmann, Lars

2010-01-01

Several actions for developing environmentally friendly technologies have been taken in most industrial fields. Significant resources have also been devoted in mobile communications industry. Moving towards eco-friendly alternatives is primarily a social responsibility for network operators....... However besides this, increasing energy efficiency represents a key factor for reducing operating expenses and deploying cost effective mobile networks. This paper presents how distributed base station architectures can contribute in greening radio access networks. More specifically, the advantages...... energy saving. Different subsystems have to be coordinated real-time and intelligent network nodes supporting complicated functionalities are necessary. Distributed base station architectures are ideal for this purpose mainly because of their high degree of configurability and self...
Internet of Things Heterogeneous Interoperable Network Architecture Design

DEFF Research Database (Denmark)

Bhalerao, Dipashree M.

2014-01-01

Internet of Thing‘s (IoT) state of the art deduce that there is no mature Internet of Things architecture available. Thesis contributes an abstract generic IoT system reference architecture development with specifications. Novelties of thesis are proposed solutions and implementations....... It is proved that reduction of data at a source will result in huge vertical scalability and indirectly horizontal also. Second non functional feature contributes in heterogeneous interoperable network architecture for constrained Things. To eliminate increasing number of gateways, Wi-Fi access point...... with Bluetooth, Zigbee (new access point is called as BZ-Fi) is proposed. Co-existence of Wi-Fi, Bluetooth, and Zigbee network technologies results in interference. To reduce the interference, orthogonal frequency division multiplexing (OFDM) is proposed tobe implemented in Bluetooth and Zigbee. The proposed...
Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

Science.gov (United States)

Mack, Harold; Reddi, S. S.

1980-04-01

Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.
The network architecture and site test of DCIS in Lungmen nuclear power station

International Nuclear Information System (INIS)

Lee, C. K.

2006-01-01

The Lungmen Nuclear Power Station (LMNPS) is located in North-Eastern Seashore of Taiwan. LMNPP has two units. Each unit generates 1350 Megawatts. It is the first ABWR Plant in Taiwan and is under-construction now. Due to contractual arrangement, there are seven large I and C suppliers/designers, which are GE NUMAC, DRS, Invensys, GEIS, Hitachi, MHI, and Stone and Webster company. The Distributed Control and Information System (DCIS) in Lungmen are fully integrated with the state-of-the-art computer and network technology. General Electric is the leading designer for integration of DCIS. This paper presents Network Architecture and the Site Test of DCIS. The network architectures are follows. GE NUMAC System adopts the point to point architecture, DRS System adopts Ring type architecture with SCRAMNET protocol, Inevnsys system adopts IGiga Byte Backbone mesh network with Rapid Spanning Tree Protocol, GEIS adopts Ethernet network with EGD protocol, Hitachi adopts ring type network with proprietary protocol. MHI adopt Ethernet network with UDP. The data-links are used for connection between different suppliers. The DCIS architecture supports the plant automation, the alarm prioritization and alarm suppression, and uniform MMI screen for entire plant. The Test Program regarding the integration of different network architectures and Initial DCIS architecture Setup for 161KV Energization will be discussed. Test tool for improving site test schedule, and lessons learned from FAT will be discussed too. And conclusions are at the end of this paper. (authors)
The network architecture and site test of DCIS in Lungmen nuclear power station

Energy Technology Data Exchange (ETDEWEB)

Lee, C. K. [Instrument and Control Section, Lungmen Nuclear Power Station, Taiwan Power Company, Taipei County Taiwan (China)

2006-07-01

The Lungmen Nuclear Power Station (LMNPS) is located in North-Eastern Seashore of Taiwan. LMNPP has two units. Each unit generates 1350 Megawatts. It is the first ABWR Plant in Taiwan and is under-construction now. Due to contractual arrangement, there are seven large I and C suppliers/designers, which are GE NUMAC, DRS, Invensys, GEIS, Hitachi, MHI, and Stone and Webster company. The Distributed Control and Information System (DCIS) in Lungmen are fully integrated with the state-of-the-art computer and network technology. General Electric is the leading designer for integration of DCIS. This paper presents Network Architecture and the Site Test of DCIS. The network architectures are follows. GE NUMAC System adopts the point to point architecture, DRS System adopts Ring type architecture with SCRAMNET protocol, Inevnsys system adopts IGiga Byte Backbone mesh network with Rapid Spanning Tree Protocol, GEIS adopts Ethernet network with EGD protocol, Hitachi adopts ring type network with proprietary protocol. MHI adopt Ethernet network with UDP. The data-links are used for connection between different suppliers. The DCIS architecture supports the plant automation, the alarm prioritization and alarm suppression, and uniform MMI screen for entire plant. The Test Program regarding the integration of different network architectures and Initial DCIS architecture Setup for 161KV Energization will be discussed. Test tool for improving site test schedule, and lessons learned from FAT will be discussed too. And conclusions are at the end of this paper. (authors)

DAPNA: an architectural framework for data processing networks

NARCIS (Netherlands)

Sözer, Hasan; Nouta, Sander; Wombacher, Andreas; Perona, Paolo

2013-01-01

A data processing network is as a set of (software) components connected through communication channels to apply a series of operations on data. Realization and maintenance of large-scale data processing networks necessitate an architectural approach that supports analysis, verification,
Architecture for Cognitive Networking within NASA's Future Space Communications Infrastructure

Science.gov (United States)

Clark, Gilbert; Eddy, Wesley M.; Johnson, Sandra K.; Barnes, James; Brooks, David

2016-01-01

Future space mission concepts and designs pose many networking challenges for command, telemetry, and science data applications with diverse end-to-end data delivery needs. For future end-to-end architecture designs, a key challenge is meeting expected application quality of service requirements for multiple simultaneous mission data flows with options to use diverse onboard local data buses, commercial ground networks, and multiple satellite relay constellations in LEO, GEO, MEO, or even deep space relay links. Effectively utilizing a complex network topology requires orchestration and direction that spans the many discrete, individually addressable computer systems, which cause them to act in concert to achieve the overall network goals. The system must be intelligent enough to not only function under nominal conditions, but also adapt to unexpected situations, and reorganize or adapt to perform roles not originally intended for the system or explicitly programmed. This paper describes an architecture enabling the development and deployment of cognitive networking capabilities into the envisioned future NASA space communications infrastructure. We begin by discussing the need for increased automation, including inter-system discovery and collaboration. This discussion frames the requirements for an architecture supporting cognitive networking for future missions and relays, including both existing endpoint-based networking models and emerging information-centric models. From this basis, we discuss progress on a proof-of-concept implementation of this architecture, and results of implementation and initial testing of a cognitive networking on-orbit application on the SCaN Testbed attached to the International Space Station.
Network architectures and protocols for the integration of ACTS and ISDN

Science.gov (United States)

Chitre, D. M.; Lowry, P. A.

1992-01-01

A close integration of satellite networks and the integrated services digital network (ISDN) is essential for satellite networks to carry ISDN traffic effectively. This also shows how a given (pre-ISDN) satellite network architecture can be enhanced to handle ISDN signaling and provide ISDN services. It also describes the functional architecture and high-level protocols that could be implemented in the NASA Advanced Communications Technology Satellite (ACTS) low burst rate communications system to provide ISDN services.
A new architecture for Fermilab's cryogenic control system

International Nuclear Information System (INIS)

Smolucha, J.; Frank, A.; Seino, K.; Lackey, S.

1992-01-01

In order to achieve design energy in the Tevatron, the magnet system will be operated at lower temperatures. The increased requirements of operating the Tevatron at lower temperatures necessitated a major upgrade to the both the hardware and software components of the cryogenic control system. The new architecture is based on a distributed topology which couples Fermilab designed I/O subsystems to high performance, 80386 execution processors via a variety of networks including: Arcnet, iPSB, and token ring. (author)
CPU architecture for a fast and energy-saving calculation of convolution neural networks

Science.gov (United States)

Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

2017-06-01

One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
Huffman-based code compression techniques for embedded processors

KAUST Repository

Bonny, Mohamed Talal; Henkel, Jö rg

2010-01-01

% for ARM and MIPS, respectively. In our compression technique, we have conducted evaluations using a representative set of applications and we have applied each technique to two major embedded processor architectures, namely ARM and MIPS. © 2010 ACM.
Brain inspired hardware architectures - Can they be used for particle physics ?

CERN Multimedia

CERN. Geneva

2016-01-01

After their inception in the 1940s and several decades of moderate success, artificial neural networks have recently demonstrated impressive achievements in analysing big data volumes. Wide and deep network architectures can now be trained using high performance computing systems, graphics card clusters in particular. Despite their successes these state-of-the-art approaches suffer from very long training times and huge energy consumption, in particular during the training phase. The biological brain can perform similar and superior classification tasks in the space and time domains, but at the same time exhibits very low power consumption, rapid unsupervised learning capabilities and fault tolerance. In the talk the differences between classical neural networks and neural circuits in the brain will be presented. Recent hardware implementations of neuromorphic computing systems and their applications will be shown. Finally, some initial ideas to use accelerated neural architectures as trigger processors i...
Very wide register : an asymmetric register file organization for low power embedded processors.

NARCIS (Netherlands)

Raghavan, P.; Lambrechts, A.; Jayapala, M.; Catthoor, F.; Verkest, D.T.M.L.; Corporaal, H.

2007-01-01

In current embedded systems processors, multi-ported register files are one of the most power hungry parts of the processor, even when they are clustered. This paper presents a novel register file architecture, which has single ported cells and asymmetric interfaces to the memory and to the
The performance of an LSI-11/23 with a SKYMNK-Q array processor as a high speed front end processor

International Nuclear Information System (INIS)

Clark, D.L.

1983-01-01

The NSRL has recently installed a VAX-11/750 based data acquisition system which is networked to two LSI-11/23 satellite processors. Each of the LSI's are connected to CAMAC branch drivers. The LSI's have small array processors installed for use in preprocessing data. The objective is to provide an easy to use high speed processor that will relieve the VAX of some of the real-time data analysis tasks. The basic operation of the array processor and some of the results of performance tests are described
Processor-in-memory-and-storage architecture

Science.gov (United States)

DeBenedictis, Erik

2018-01-02

A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code word is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.
An architectural model for network interconnection

NARCIS (Netherlands)

van Sinderen, Marten J.; Vissers, C.A.; Kalin, T.

1983-01-01

This paper presents a technique of successive decomposition of a common users' activity to illustrate the problems of network interconnection. The criteria derived from this approach offer a structuring principle which is used to develop an architectural model that embeds heterogeneous subnetworks
A NEW OS ARCHITECTURE FOR IOT

Directory of Open Access Journals (Sweden)

Jean Y. Astier

2018-03-01

Full Text Available Current computer operating systems architectures are not well suited for the coming world of connected objects, known as the Internet of Things (IoT for multiple reasons: poor communication performances in both point-to-point and broadcast cases, poor operational reliability and network security, excessive requirements both in terms of processor power and memory size leading to excessive electrical power consumption. We introduce a new computer operating system architecture well adapted to IoT, from the most modest to the most complex, and more generally able to significantly raise the input/output capacities of any communicating computer. This architecture rests on the principles of the Von Neumann hardware model, and is composed of two types of asymmetric distributed containers, which communicate by message passing. We describe the sub-systems of both of these types of containers, where each sub-system has its own scheduler, and a dedicated execution level.
The plasma automata network (PAN) architecture

International Nuclear Information System (INIS)

Cameron-Carey, C.M.

1991-01-01

Conventional neural networks consist of processing elements which are interconnected according to a specified topology. Typically, the number of processing elements and the interconnection topology are fixed. A neural network's information processing capability lies mainly in the variability of interconnection strengths, which directly influence activation patterns; these patterns represent entities and their interrelationships. Contrast this architecture, with its fixed topology and variable interconnection strengths, against one having dynamic topology and fixed connection strength. This paper reports on this proposed architecture in which there are no connections between processing elements. Instead, the processing elements form a plasma, exchanging information upon collision. A plasma can be populated with several different types of processing elements, each with their won activation function and self-modification mechanism. The activation patterns that are the plasma;s response to stimulation drive natural selection among processing elements which evolve to optimize performance
High-performance, scalable optical network-on-chip architectures

Science.gov (United States)

Tan, Xianfang

The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of
Mobile opportunistic networks architectures, protocols and applications

CERN Document Server

Denko, Mieso K

2011-01-01

Widespread availability of pervasive and mobile devices coupled with recent advances in networking technologies make opportunistic networks one of the most promising communication technologies for a growing number of future mobile applications. Covering the basics as well as advanced concepts, this book introduces state-of-the-art research findings, technologies, tools, and innovations. Prominent researchers from academia and industry report on communication architectures, network algorithms and protocols, emerging applications, experimental studies, simulation tools, implementation test beds,
Enabling Tussle-Agile Inter-networking Architectures by Underlay Virtualisation

Science.gov (United States)

Dianati, Mehrdad; Tafazolli, Rahim; Moessner, Klaus

In this paper, we propose an underlay inter-network virtualisation framework in order to enable tussle-agile flexible networking over the existing inter-network infrastructures. The functionalities that inter-networking elements (transit nodes, access networks, etc.) need to support in order to enable virtualisation are discussed. We propose the base architectures of each the abstract elements to support the required inter-network virtualisation functionalities.
Comparison of different artificial neural network architectures in modeling of Chlorella sp. flocculation.

Science.gov (United States)

Zenooz, Alireza Moosavi; Ashtiani, Farzin Zokaee; Ranjbar, Reza; Nikbakht, Fatemeh; Bolouri, Oberon

2017-07-03

Biodiesel production from microalgae feedstock should be performed after growth and harvesting of the cells, and the most feasible method for harvesting and dewatering of microalgae is flocculation. Flocculation modeling can be used for evaluation and prediction of its performance under different affective parameters. However, the modeling of flocculation in microalgae is not simple and has not performed yet, under all experimental conditions, mostly due to different behaviors of microalgae cells during the process under different flocculation conditions. In the current study, the modeling of microalgae flocculation is studied with different neural network architectures. Microalgae species, Chlorella sp., was flocculated with ferric chloride under different conditions and then the experimental data modeled using artificial neural network. Neural network architectures of multilayer perceptron (MLP) and radial basis function architectures, failed to predict the targets successfully, though, modeling was effective with ensemble architecture of MLP networks. Comparison between the performances of the ensemble and each individual network explains the ability of the ensemble architecture in microalgae flocculation modeling.
Design of RISC Processor Using VHDL and Cadence

Science.gov (United States)

Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram

The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.
Noise limitations in optical linear algebra processors.

Science.gov (United States)

Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

1990-05-10

A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.
Software/hardware distributed processing network supporting the Ada environment

Science.gov (United States)

Wood, Richard J.; Pryk, Zen

1993-09-01

A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.

Architecture and dynamics of overlapped RNA regulatory networks.

Science.gov (United States)

Lapointe, Christopher P; Preston, Melanie A; Wilinski, Daniel; Saunders, Harriet A J; Campbell, Zachary T; Wickens, Marvin

2017-11-01

A single protein can bind and regulate many mRNAs. Multiple proteins with similar specificities often bind and control overlapping sets of mRNAs. Yet little is known about the architecture or dynamics of overlapped networks. We focused on three proteins with similar structures and related RNA-binding specificities-Puf3p, Puf4p, and Puf5p of S. cerevisiae Using RNA Tagging, we identified a "super-network" comprised of four subnetworks: Puf3p, Puf4p, and Puf5p subnetworks, and one controlled by both Puf4p and Puf5p. The architecture of individual subnetworks, and thus the super-network, is determined by competition among particular PUF proteins to bind mRNAs, their affinities for binding elements, and the abundances of the proteins. The super-network responds dramatically: The remaining network can either expand or contract. These strikingly opposite outcomes are determined by an interplay between the relative abundance of the RNAs and proteins, and their affinities for one another. The diverse interplay between overlapping RNA-protein networks provides versatile opportunities for regulation and evolution. © 2017 Lapointe et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Resting state networks' corticotopy: the dual intertwined rings architecture.

Directory of Open Access Journals (Sweden)

Salma Mesmoudi

Full Text Available How does the brain integrate multiple sources of information to support normal sensorimotor and cognitive functions? To investigate this question we present an overall brain architecture (called "the dual intertwined rings architecture" that relates the functional specialization of cortical networks to their spatial distribution over the cerebral cortex (or "corticotopy". Recent results suggest that the resting state networks (RSNs are organized into two large families: 1 a sensorimotor family that includes visual, somatic, and auditory areas and 2 a large association family that comprises parietal, temporal, and frontal regions and also includes the default mode network. We used two large databases of resting state fMRI data, from which we extracted 32 robust RSNs. We estimated: (1 the RSN functional roles by using a projection of the results on task based networks (TBNs as referenced in large databases of fMRI activation studies; and (2 relationship of the RSNs with the Brodmann Areas. In both classifications, the 32 RSNs are organized into a remarkable architecture of two intertwined rings per hemisphere and so four rings linked by homotopic connections. The first ring forms a continuous ensemble and includes visual, somatic, and auditory cortices, with interspersed bimodal cortices (auditory-visual, visual-somatic and auditory-somatic, abbreviated as VSA ring. The second ring integrates distant parietal, temporal and frontal regions (PTF ring through a network of association fiber tracts which closes the ring anatomically and ensures a functional continuity within the ring. The PTF ring relates association cortices specialized in attention, language and working memory, to the networks involved in motivation and biological regulation and rhythms. This "dual intertwined architecture" suggests a dual integrative process: the VSA ring performs fast real-time multimodal integration of sensorimotor information whereas the PTF ring performs multi
On Event-Triggered Adaptive Architectures for Decentralized and Distributed Control of Large-Scale Modular Systems.

Science.gov (United States)

Albattat, Ali; Gruenwald, Benjamin C; Yucelen, Tansel

2016-08-16

The last decade has witnessed an increased interest in physical systems controlled over wireless networks (networked control systems). These systems allow the computation of control signals via processors that are not attached to the physical systems, and the feedback loops are closed over wireless networks. The contribution of this paper is to design and analyze event-triggered decentralized and distributed adaptive control architectures for uncertain networked large-scale modular systems; that is, systems consist of physically-interconnected modules controlled over wireless networks. Specifically, the proposed adaptive architectures guarantee overall system stability while reducing wireless network utilization and achieving a given system performance in the presence of system uncertainties that can result from modeling and degraded modes of operation of the modules and their interconnections between each other. In addition to the theoretical findings including rigorous system stability and the boundedness analysis of the closed-loop dynamical system, as well as the characterization of the effect of user-defined event-triggering thresholds and the design parameters of the proposed adaptive architectures on the overall system performance, an illustrative numerical example is further provided to demonstrate the efficacy of the proposed decentralized and distributed control approaches.
On Event-Triggered Adaptive Architectures for Decentralized and Distributed Control of Large-Scale Modular Systems

Directory of Open Access Journals (Sweden)

Ali Albattat

2016-08-01

Full Text Available The last decade has witnessed an increased interest in physical systems controlled over wireless networks (networked control systems. These systems allow the computation of control signals via processors that are not attached to the physical systems, and the feedback loops are closed over wireless networks. The contribution of this paper is to design and analyze event-triggered decentralized and distributed adaptive control architectures for uncertain networked large-scale modular systems; that is, systems consist of physically-interconnected modules controlled over wireless networks. Specifically, the proposed adaptive architectures guarantee overall system stability while reducing wireless network utilization and achieving a given system performance in the presence of system uncertainties that can result from modeling and degraded modes of operation of the modules and their interconnections between each other. In addition to the theoretical findings including rigorous system stability and the boundedness analysis of the closed-loop dynamical system, as well as the characterization of the effect of user-defined event-triggering thresholds and the design parameters of the proposed adaptive architectures on the overall system performance, an illustrative numerical example is further provided to demonstrate the efficacy of the proposed decentralized and distributed control approaches.
Control structures for high speed processors

Science.gov (United States)

Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

1982-01-01

A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.
Advances in network systems architectures, security, and applications

CERN Document Server

Awad, Ali; Furtak, Janusz; Legierski, Jarosław

2017-01-01

This book provides the reader with a comprehensive selection of cutting–edge algorithms, technologies, and applications. The volume offers new insights into a range of fundamentally important topics in network architectures, network security, and network applications. It serves as a reference for researchers and practitioners by featuring research contributions exemplifying research done in the field of network systems. In addition, the book highlights several key topics in both theoretical and practical aspects of networking. These include wireless sensor networks, performance of TCP connections in mobile networks, photonic data transport networks, security policies, credentials management, data encryption for network transmission, risk management, live TV services, and multicore energy harvesting in distributed systems. .
MAP3D: a media processor approach for high-end 3D graphics

Science.gov (United States)

Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

1999-12-01

Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
High performance graphics processors for medical imaging applications

International Nuclear Information System (INIS)

Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

1989-01-01

This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

Science.gov (United States)

Ly, Daniel Le; Chow, Paul

2010-11-01

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.
A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

Energy Technology Data Exchange (ETDEWEB)

Aliaga, José I., E-mail: aliaga@uji.es [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain); Alonso, Pedro [Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València (Spain); Badía, José M. [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain); Chacón, Pablo [Dept. Biological Chemical Physics, Rocasolano Physics and Chemistry Institute, CSIC, Madrid (Spain); Davidović, Davor [Rudjer Bošković Institute, Centar za Informatiku i Računarstvo – CIR, Zagreb (Croatia); López-Blanco, José R. [Dept. Biological Chemical Physics, Rocasolano Physics and Chemistry Institute, CSIC, Madrid (Spain); Quintana-Ortí, Enrique S. [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain)

2016-03-15

We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousands degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.
A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

International Nuclear Information System (INIS)

Aliaga, José I.; Alonso, Pedro; Badía, José M.; Chacón, Pablo; Davidović, Davor; López-Blanco, José R.; Quintana-Ortí, Enrique S.

2016-01-01

We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousands degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.
Multicore technology architecture, reconfiguration, and modeling

CERN Document Server

Qadri, Muhammad Yasir

2013-01-01

The saturation of design complexity and clock frequencies for single-core processors has resulted in the emergence of multicore architectures as an alternative design paradigm. Nowadays, multicore/multithreaded computing systems are not only a de-facto standard for high-end applications, they are also gaining popularity in the field of embedded computing. The start of the multicore era has altered the concepts relating to almost all of the areas of computer architecture design, including core design, memory management, thread scheduling, application support, inter-processor communication, debu
High-level language computer architecture

CERN Document Server

Chu, Yaohan

1975-01-01

High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr
Marginally Stable Triangular Recurrent Neural Network Architecture for Time Series Prediction.

Science.gov (United States)

Sivakumar, Seshadri; Sivakumar, Shyamala

2017-09-25

This paper introduces a discrete-time recurrent neural network architecture using triangular feedback weight matrices that allows a simplified approach to ensuring network and training stability. The triangular structure of the weight matrices is exploited to readily ensure that the eigenvalues of the feedback weight matrix represented by the block diagonal elements lie on the unit circle in the complex z-plane by updating these weights based on the differential of the angular error variable. Such placement of the eigenvalues together with the extended close interaction between state variables facilitated by the nondiagonal triangular elements, enhances the learning ability of the proposed architecture. Simulation results show that the proposed architecture is highly effective in time-series prediction tasks associated with nonlinear and chaotic dynamic systems with underlying oscillatory modes. This modular architecture with dual upper and lower triangular feedback weight matrices mimics fully recurrent network architectures, while maintaining learning stability with a simplified training process. While training, the block-diagonal weights (hence the eigenvalues) of the dual triangular matrices are constrained to the same values during weight updates aimed at minimizing the possibility of overfitting. The dual triangular architecture also exploits the benefit of parsing the input and selectively applying the parsed inputs to the two subnetworks to facilitate enhanced learning performance.
Separating VNF and Network Control for Hardware‐Acceleration of SDN/NFV Architecture

Directory of Open Access Journals (Sweden)

Tong Duan

2017-08-01

Full Text Available A hardware‐acceleration architecture that separates virtual network functions (VNFs and network control (called HSN is proposed to solve the mismatch between the simple flow steering requirements and strong packet processing abilities of software‐defined networking (SDN forwarding elements (FEs in SDN/network function virtualization (NFV architecture, while improving the efficiency of NFV infrastructure and the performance of network‐intensive functions. HSN makes full use of FEs and accelerates VNFs through two mechanisms: (1 separation of traffic steering and packet processing in the FEs; (2 separation of SDN and NFV control in the FEs. Our HSN prototype, built on NetFPGA‐10G, demonstrates that the processing performance can be greatly improved with only a small modification of the traditional SDN/NFV architecture.
Graphics processor efficiency for realization of rapid tabular computations

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2016-01-01

Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions
Network architecture test-beds as platforms for ubiquitous computing.

Science.gov (United States)

Roscoe, Timothy

2008-10-28

Distributed systems research, and in particular ubiquitous computing, has traditionally assumed the Internet as a basic underlying communications substrate. Recently, however, the networking research community has come to question the fundamental design or 'architecture' of the Internet. This has been led by two observations: first, that the Internet as it stands is now almost impossible to evolve to support new functionality; and second, that modern applications of all kinds now use the Internet rather differently, and frequently implement their own 'overlay' networks above it to work around its perceived deficiencies. In this paper, I discuss recent academic projects to allow disruptive change to the Internet architecture, and also outline a radically different view of networking for ubiquitous computing that such proposals might facilitate.
Design of Network Architectures: Role of Game Theory and Economics

OpenAIRE

Shetty, Nikhil

2010-01-01

The economics of the market that a network architecture enables has a important bearing on its success and eventual adoption. Some of these economic issues are tightly coupled with the design of the network architecture. A poor design could end up making certain markets very difficult to enable, even if they are in the better interest of society. Theanalysis of these cross-disciplinary problems requires understanding both the technology and the economic aspects. This thesis introduces three m...
A multi-agent system architecture for sensor networks.

Science.gov (United States)

Fuentes-Fernández, Rubén; Guijarro, María; Pajares, Gonzalo

2009-01-01

The design of the control systems for sensor networks presents important challenges. Besides the traditional problems about how to process the sensor data to obtain the target information, engineers need to consider additional aspects such as the heterogeneity and high number of sensors, and the flexibility of these networks regarding topologies and the sensors in them. Although there are partial approaches for resolving these issues, their integration relies on ad hoc solutions requiring important development efforts. In order to provide an effective approach for this integration, this paper proposes an architecture based on the multi-agent system paradigm with a clear separation of concerns. The architecture considers sensors as devices used by an upper layer of manager agents. These agents are able to communicate and negotiate services to achieve the required functionality. Activities are organized according to roles related with the different aspects to integrate, mainly sensor management, data processing, communication and adaptation to changes in the available devices and their capabilities. This organization largely isolates and decouples the data management from the changing network, while encouraging reuse of solutions. The use of the architecture is facilitated by a specific modelling language developed through metamodelling. A case study concerning a generic distributed system for fire fighting illustrates the approach and the comparison with related work.
A DRM Security Architecture for Home Networks

NARCIS (Netherlands)

Popescu, B.C.; Crispo, B.; Kamperman, F.L.A.J.; Tanenbaum, A.S.; Kiayias, A.; Yung, M.

2004-01-01

This paper describes a security architecture allowing digital rights management in home networks consisting of consumer electronic devices. The idea is to allow devices to establish dynamic groups, so called "Authorized Domains", where legally acquired copyrighted content can seamlessly move from

Network based control point for UPnP QoS architecture

DEFF Research Database (Denmark)

Brewka, Lukasz Jerzy; Wessing, Henrik; Rossello Busquet, Ana

2011-01-01

Enabling coexistence of non-UPnP Devices in an UPnP QoS Architecture is an important issue that might have a major impact on the deployment and usability of UPnP in future home networks. The work presented here shows potential issues of placing non-UPnP Device in the network managed by UPnP QoS. We...... address this issue by extensions to the UPnP QoS Architecture that can prevent non-UPnP Devices from degrading the overall QoS level. The obtained results show that deploying Network Based Control Point service with efficient traffic classifier, improves significantly the end-to-end packet delay...
Routing architecture and security for airborne networks

Science.gov (United States)

Deng, Hongmei; Xie, Peng; Li, Jason; Xu, Roger; Levy, Renato

2009-05-01

Airborne networks are envisioned to provide interconnectivity for terrestial and space networks by interconnecting highly mobile airborne platforms. A number of military applications are expected to be used by the operator, and all these applications require proper routing security support to establish correct route between communicating platforms in a timely manner. As airborne networks somewhat different from traditional wired and wireless networks (e.g., Internet, LAN, WLAN, MANET, etc), security aspects valid in these networks are not fully applicable to airborne networks. Designing an efficient security scheme to protect airborne networks is confronted with new requirements. In this paper, we first identify a candidate routing architecture, which works as an underlying structure for our proposed security scheme. And then we investigate the vulnerabilities and attack models against routing protocols in airborne networks. Based on these studies, we propose an integrated security solution to address routing security issues in airborne networks.
UNIBUS processor interface for a FASTBUS data acquisition system

International Nuclear Information System (INIS)

Larwill, M.; Lagerlund, T.D.; Barsotti, E.; Taff, L.M.; Franzen, J.

1981-01-01

Current work on a FASTBUS data acquisition system at Fermilab is described. The system will consist of three pieces of FASTBUS hardware: a UNIBUS processor interface (UPI), a dual-ported bulk memory, and a FASTBUS ''event builder'' (i.e., data acquisition processor). Primary efforts have been on specifying and constructing a UPI. The present specification includes capability for all basic FASTBUS operations, including list processing of consecutive FASTBUS operations. Some possible FASTBUS data acquisition system architectures employing the UPI are discussed along with some detailed specifications of the UPI itself
Robust quantum network architectures and topologies for entanglement distribution

Science.gov (United States)

Das, Siddhartha; Khatri, Sumeet; Dowling, Jonathan P.

2018-01-01

Entanglement distribution is a prerequisite for several important quantum information processing and computing tasks, such as quantum teleportation, quantum key distribution, and distributed quantum computing. In this work, we focus on two-dimensional quantum networks based on optical quantum technologies using dual-rail photonic qubits for the building of a fail-safe quantum internet. We lay out a quantum network architecture for entanglement distribution between distant parties using a Bravais lattice topology, with the technological constraint that quantum repeaters equipped with quantum memories are not easily accessible. We provide a robust protocol for simultaneous entanglement distribution between two distant groups of parties on this network. We also discuss a memory-based quantum network architecture that can be implemented on networks with an arbitrary topology. We examine networks with bow-tie lattice and Archimedean lattice topologies and use percolation theory to quantify the robustness of the networks. In particular, we provide figures of merit on the loss parameter of the optical medium that depend only on the topology of the network and quantify the robustness of the network against intermittent photon loss and intermittent failure of nodes. These figures of merit can be used to compare the robustness of different network topologies in order to determine the best topology in a given real-world scenario, which is critical in the realization of the quantum internet.
Quantum perceptron over a field and neural network architecture selection in a quantum computer.

Science.gov (United States)

da Silva, Adenilton José; Ludermir, Teresa Bernarda; de Oliveira, Wilson Rosa

2016-04-01

In this work, we propose a quantum neural network named quantum perceptron over a field (QPF). Quantum computers are not yet a reality and the models and algorithms proposed in this work cannot be simulated in actual (or classical) computers. QPF is a direct generalization of a classical perceptron and solves some drawbacks found in previous models of quantum perceptrons. We also present a learning algorithm named Superposition based Architecture Learning algorithm (SAL) that optimizes the neural network weights and architectures. SAL searches for the best architecture in a finite set of neural network architectures with linear time over the number of patterns in the training set. SAL is the first learning algorithm to determine neural network architectures in polynomial time. This speedup is obtained by the use of quantum parallelism and a non-linear quantum operator. Copyright © 2016 Elsevier Ltd. All rights reserved.
Criteria for Evaluating Alternative Network and Link Layer Protocols for the NASA Constellation Program Communication Architecture

Science.gov (United States)

Benbenek, Daniel; Soloff, Jason; Lieb, Erica

2010-01-01

Selecting a communications and network architecture for future manned space flight requires an evaluation of the varying goals and objectives of the program, development of communications and network architecture evaluation criteria, and assessment of critical architecture trades. This paper uses Cx Program proposed exploration activities as a guideline; lunar sortie, outpost, Mars, and flexible path options are described. A set of proposed communications network architecture criteria are proposed and described. They include: interoperability, security, reliability, and ease of automating topology changes. Finally a key set of architecture options are traded including (1) multiplexing data at a common network layer vs. at the data link layer, (2) implementing multiple network layers vs. a single network layer, and (3) the use of a particular network layer protocol, primarily IPv6 vs. Delay Tolerant Networking (DTN). In summary, the protocol options are evaluated against the proposed exploration activities and their relative performance with respect to the criteria are assessed. An architectural approach which includes (a) the capability of multiplexing at both the network layer and the data link layer and (b) a single network layer for operations at each program phase, as these solutions are best suited to respond to the widest array of program needs and meet each of the evaluation criteria.
A single chip pulse processor for nuclear spectroscopy

International Nuclear Information System (INIS)

Hilsenrath, F.; Bakke, J.C.; Voss, H.D.

1985-01-01

A high performance digital pulse processor, integrated into a single gate array microcircuit, has been developed for spaceflight applications. The new approach takes advantage of the latest CMOS high speed A/D flash converters and low-power gated logic arrays. The pulse processor measures pulse height, pulse area and the required timing information (e.g. multi detector coincidence and pulse pile-up detection). The pulse processor features high throughput rate (e.g. 0.5 Mhz for 2 usec gausssian pulses) and improved differential linearity (e.g. + or - 0.2 LSB for a + or - 1 LSB A/D). Because of the parallel digital architecture of the device, the interface is microprocessor bus compatible. A satellite flight application of this module is presented for use in the X-ray imager and high energy particle spectrometers of the PEM experiment on the Upper Atmospheric Research Satellite
DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

Science.gov (United States)

Kim, Lok-Won

2018-05-01

Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).
SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Directory of Open Access Journals (Sweden)

Marijana Zekić-Sušac

2012-07-01

Full Text Available After production and operations, finance and investments are one of the mostfrequent areas of neural network applications in business. The lack of standardizedparadigms that can determine the efficiency of certain NN architectures in a particularproblem domain is still present. The selection of NN architecture needs to take intoconsideration the type of the problem, the nature of the data in the model, as well as somestrategies based on result comparison. The paper describes previous research in that areaand suggests a forward strategy for selecting best NN algorithm and structure. Since thestrategy includes both parameter-based and variable-based testings, it can be used forselecting NN architectures as well as for extracting models. The backpropagation, radialbasis,modular, LVQ and probabilistic neural network algorithms were used on twoindependent sets: stock market and credit scoring data. The results show that neuralnetworks give better accuracy comparing to multiple regression and logistic regressionmodels. Since it is model-independant, the strategy can be used by researchers andprofessionals in other areas of application.
Parallelization of applications for networks with homogeneous and heterogeneous processors

International Nuclear Information System (INIS)

Colombet, L.

1994-01-01

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs
Self-powered information measuring wireless networks using the distribution of tasks within multicore processors

Science.gov (United States)

Zhuravska, Iryna M.; Koretska, Oleksandra O.; Musiyenko, Maksym P.; Surtel, Wojciech; Assembay, Azat; Kovalev, Vladimir; Tleshova, Akmaral

2017-08-01

The article contains basic approaches to develop the self-powered information measuring wireless networks (SPIM-WN) using the distribution of tasks within multicore processors critical applying based on the interaction of movable components - as in the direction of data transmission as wireless transfer of energy coming from polymetric sensors. Base mathematic model of scheduling tasks within multiprocessor systems was modernized to schedule and allocate tasks between cores of one-crystal computer (SoC) to increase energy efficiency SPIM-WN objects.
A Multi-Agent System Architecture for Sensor Networks

Directory of Open Access Journals (Sweden)

María Guijarro

2009-12-01

Full Text Available The design of the control systems for sensor networks presents important challenges. Besides the traditional problems about how to process the sensor data to obtain the target information, engineers need to consider additional aspects such as the heterogeneity and high number of sensors, and the flexibility of these networks regarding topologies and the sensors in them. Although there are partial approaches for resolving these issues, their integration relies on ad hoc solutions requiring important development efforts. In order to provide an effective approach for this integration, this paper proposes an architecture based on the multi-agent system paradigm with a clear separation of concerns. The architecture considers sensors as devices used by an upper layer of manager agents. These agents are able to communicate and negotiate services to achieve the required functionality. Activities are organized according to roles related with the different aspects to integrate, mainly sensor management, data processing, communication and adaptation to changes in the available devices and their capabilities. This organization largely isolates and decouples the data management from the changing network, while encouraging reuse of solutions. The use of the architecture is facilitated by a specific modelling language developed through metamodelling. A case study concerning a generic distributed system for fire fighting illustrates the approach and the comparison with related work.
Multipurpose silicon photonics signal processor core.

Science.gov (United States)

Pérez, Daniel; Gasulla, Ivana; Crudgington, Lee; Thomson, David J; Khokhar, Ali Z; Li, Ke; Cao, Wei; Mashanovich, Goran Z; Capmany, José

2017-09-21

Integrated photonics changes the scaling laws of information and communication systems offering architectural choices that combine photonics with electronics to optimize performance, power, footprint, and cost. Application-specific photonic integrated circuits, where particular circuits/chips are designed to optimally perform particular functionalities, require a considerable number of design and fabrication iterations leading to long development times. A different approach inspired by electronic Field Programmable Gate Arrays is the programmable photonic processor, where a common hardware implemented by a two-dimensional photonic waveguide mesh realizes different functionalities through programming. Here, we report the demonstration of such reconfigurable waveguide mesh in silicon. We demonstrate over 20 different functionalities with a simple seven hexagonal cell structure, which can be applied to different fields including communications, chemical and biomedical sensing, signal processing, multiprocessor networks, and quantum information systems. Our work is an important step toward this paradigm.Integrated optical circuits today are typically designed for a few special functionalities and require complex design and development procedures. Here, the authors demonstrate a reconfigurable but simple silicon waveguide mesh with different functionalities.
A Reference Architecture for Network-Centric Information Systems

National Research Council Canada - National Science Library

Renner, Scott; Schaefer, Ronald

2003-01-01

This paper presents the "C2 Enterprise Reference Architecture" (C2ERA), which is a new technical concept of operations for building information systems better suited to the Network-Centric Warfare (NCW) environment...
Cloud Radio Access Network architecture. Towards 5G mobile networks

DEFF Research Database (Denmark)

Checko, Aleksandra

Cloud Radio Access Network (C-RAN) is a novel mobile network architecture which can address a number of challenges that mobile operators face while trying to support ever-growing end-users’ needs towards 5th generation of mobile networks (5G). The main idea behind C-RAN is to split the base...... stations into radio and baseband parts, and pool the Baseband Units (BBUs) from multiple base stations into a centralized and virtualized BBU Pool. This gives a number of benefits in terms of cost and capacity. However, the challenge is then to find an optimal functionality splitting point as well...... as to design the socalled fronthaul network, interconnecting those parts. This thesis focuses on quantifying those benefits and proposing a flexible and capacity-optimized fronthaul network. It is shown that a C-RAN with a functional split resulting in a variable bit rate on the fronthaul links brings cost...
NATO Human View Architecture and Human Networks

Science.gov (United States)

Handley, Holly A. H.; Houston, Nancy P.

2010-01-01

The NATO Human View is a system architectural viewpoint that focuses on the human as part of a system. Its purpose is to capture the human requirements and to inform on how the human impacts the system design. The viewpoint contains seven static models that include different aspects of the human element, such as roles, tasks, constraints, training and metrics. It also includes a Human Dynamics component to perform simulations of the human system under design. One of the static models, termed Human Networks, focuses on the human-to-human communication patterns that occur as a result of ad hoc or deliberate team formation, especially teams distributed across space and time. Parameters of human teams that effect system performance can be captured in this model. Human centered aspects of networks, such as differences in operational tempo (sense of urgency), priorities (common goal), and team history (knowledge of the other team members), can be incorporated. The information captured in the Human Network static model can then be included in the Human Dynamics component so that the impact of distributed teams is represented in the simulation. As the NATO militaries transform to a more networked force, the Human View architecture is an important tool that can be used to make recommendations on the proper mix of technological innovations and human interactions.
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

KAUST Repository

Malas, Tareq Majed Yasin; Ahmadia, Aron; Brown, Jed; Gunnels, John A.; Keyes, David E.

2012-01-01

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution
Multi-core System Architecture for Safety-critical Control Applications

DEFF Research Database (Denmark)

Li, Gang

and size, and high power consumption. Increasing the frequency of a processor is becoming painful now due to the explosive power consumption. Furthermore, components integrated into a single-core processor have to be certified to the highest SIL, due to that no isolation is provided in a traditional single...... certification cost. Meanwhile, hardware platforms with improved processing power are required to execute the applications of larger size. To tackle the two issues mentioned above, the state of the art approaches are using more Electronic Control Units (ECU) in a federated architecture or increasing......-core processor. A promising alternative to improve processing power and provide isolation is to adopt a multi-core architecture with on-chip isolation. In general, a specific multi-core architecture can facilitate the development and certification of safety-related systems, due to its physical isolation between...
Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

Science.gov (United States)

2015-06-01

CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable
Sensitivity Study on Availability of I&C Components Using Bayesian Network

Directory of Open Access Journals (Sweden)

Rahman Khalil Ur

2013-01-01

Full Text Available The objective of this study is to find out the impact of instrumentation and control (I&C components on the availability of I&C systems in terms of sensitivity analysis using Bayesian network. The analysis has been performed on I&C architecture of reactor protection system. The analysis results would be applied to develop I&C architecture which will meet the desire reliability features and save cost. RPS architecture unavailability P(x=0 and availability P(x=1 were estimated to 6.1276E-05 and 9.9994E-01 for failure (0 and perfect (1 states, respectively. The impact of I&C components on overall system risk has been studied in terms of risk achievement worth (RAW and risk reduction worth (RRW. It is found that circuit breaker failure (TCB, bi-stable processor (BP, sensor transmitter (TR, and pressure transmitter (PT have high impact on risk. The study concludes and recommends that circuit breaker bi-stable processor should be given more consideration while designing I&C architecture.

Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

OpenAIRE

Hristov Ivan; Goranov Goran; Hristova Radoslava

2018-01-01

We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP”) in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL). The results show 2 times better per...
Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

Directory of Open Access Journals (Sweden)

Enrico Calore

2018-06-01

Full Text Available Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short. We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS technique.
Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Directory of Open Access Journals (Sweden)

Ananya Muddukrishna

2015-01-01

Full Text Available Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
Manned/Unmanned Common Architecture Program (MCAP) net centric flight tests

Science.gov (United States)

Johnson, Dale

2009-04-01

Properly architected avionics systems can reduce the costs of periodic functional improvements, maintenance, and obsolescence. With this in mind, the U.S. Army Aviation Applied Technology Directorate (AATD) initiated the Manned/Unmanned Common Architecture Program (MCAP) in 2003 to develop an affordable, high-performance embedded mission processing architecture for potential application to multiple aviation platforms. MCAP analyzed Army helicopter and unmanned air vehicle (UAV) missions, identified supporting subsystems, surveyed advanced hardware and software technologies, and defined computational infrastructure technical requirements. The project selected a set of modular open systems standards and market-driven commercial-off-theshelf (COTS) electronics and software, and, developed experimental mission processors, network architectures, and software infrastructures supporting the integration of new capabilities, interoperability, and life cycle cost reductions. MCAP integrated the new mission processing architecture into an AH-64D Apache Longbow and participated in Future Combat Systems (FCS) network-centric operations field experiments in 2006 and 2007 at White Sands Missile Range (WSMR), New Mexico and at the Nevada Test and Training Range (NTTR) in 2008. The MCAP Apache also participated in PM C4ISR On-the-Move (OTM) Capstone Experiments 2007 (E07) and 2008 (E08) at Ft. Dix, NJ and conducted Mesa, Arizona local area flight tests in December 2005, February 2006, and June 2008.
A comparison of neural network architectures for the prediction of MRR in EDM

Science.gov (United States)

Jena, A. R.; Das, Raja

2017-11-01

The aim of the research work is to predict the material removal rate of a work-piece in electrical discharge machining (EDM). Here, an effort has been made to predict the material removal rate through back-propagation neural network (BPN) and radial basis function neural network (RBFN) for a work-piece of AISI D2 steel. The input parameters for the architecture are discharge-current (Ip), pulse-duration (Ton), and duty-cycle (τ) taken for consideration to obtained the output for material removal rate of the work-piece. In the architecture, it has been observed that radial basis function neural network is comparatively faster than back-propagation neural network but logically back-propagation neural network results more real value. Therefore BPN may consider as a better process in this architecture for consistent prediction to save time and money for conducting experiments.
Bulk-memory processor for data acquisition

International Nuclear Information System (INIS)

Nelson, R.O.; McMillan, D.E.; Sunier, J.W.; Meier, M.; Poore, R.V.

1981-01-01

To meet the diverse needs and data rate requirements at the Van de Graaff and Weapons Neutron Research (WNR) facilities, a bulk memory system has been implemented which includes a fast and flexible processor. This bulk memory processor (BMP) utilizes bit slice and microcode techniques and features a 24 bit wide internal architecture allowing direct addressing of up to 16 megawords of memory and histogramming up to 16 million counts per channel without overflow. The BMP is interfaced to the MOSTEK MK 8000 bulk memory system and to the standard MODCOMP computer I/O bus. Coding for the BMP both at the microcode level and with macro instructions is supported. The generalized data acquisition system has been extended to support the BMP in a manner transparent to the user
A swarm intelligence framework for reconstructing gene networks: searching for biologically plausible architectures.

Science.gov (United States)

Kentzoglanakis, Kyriakos; Poole, Matthew

2012-01-01

In this paper, we investigate the problem of reverse engineering the topology of gene regulatory networks from temporal gene expression data. We adopt a computational intelligence approach comprising swarm intelligence techniques, namely particle swarm optimization (PSO) and ant colony optimization (ACO). In addition, the recurrent neural network (RNN) formalism is employed for modeling the dynamical behavior of gene regulatory systems. More specifically, ACO is used for searching the discrete space of network architectures and PSO for searching the corresponding continuous space of RNN model parameters. We propose a novel solution construction process in the context of ACO for generating biologically plausible candidate architectures. The objective is to concentrate the search effort into areas of the structure space that contain architectures which are feasible in terms of their topological resemblance to real-world networks. The proposed framework is initially applied to the reconstruction of a small artificial network that has previously been studied in the context of gene network reverse engineering. Subsequently, we consider an artificial data set with added noise for reconstructing a subnetwork of the genetic interaction network of S. cerevisiae (yeast). Finally, the framework is applied to a real-world data set for reverse engineering the SOS response system of the bacterium Escherichia coli. Results demonstrate the relative advantage of utilizing problem-specific knowledge regarding biologically plausible structural properties of gene networks over conducting a problem-agnostic search in the vast space of network architectures.
Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

Directory of Open Access Journals (Sweden)

Abdul Kareem PARCHUR

2010-12-01

Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.
Agent-based Personal Network (PN) service architecture

DEFF Research Database (Denmark)

Jiang, Bo; Olesen, Henning

2004-01-01

In this paper we proposte a new concept for a centralized agent system as the solution for the PN service architecture, which aims to efficiently control and manage the PN resources and enable the PN based services to run seamlessly over different networks and devices. The working principle...
Underwater Sensor Networks: A New Energy Efficient and Robust Architecture

NARCIS (Netherlands)

Climent, Salvador; Capella, Juan Vincente; Meratnia, Nirvana; Serrano, Juan José

2012-01-01

The specific characteristics of underwater environments introduce new challenges for networking protocols. In this paper, a specialized architecture for underwater sensor networks (UWSNs) is proposed and evaluated. Experiments are conducted in order to analyze the suitability of this protocol for
Modal Processor Effects Inspired by Hammond Tonewheel Organs

Directory of Open Access Journals (Sweden)

Kurt James Werner

2016-06-01

Full Text Available In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input.
Stepping motor control processor reference manual. Volume I

International Nuclear Information System (INIS)

Holloway, F.W.; VanArsdall, P.J.; Suski, G.J.; Gant, R.G.; Rash, M.

1980-01-01

This manual is intended to serve several purposes. The first goal is to describe the capabilities and operation of the SMC processor package from an operator or user point of view. Secondly, the manual will describe in some detail the basic hardware elements and how they can be used effectively to implement a step motor control system. Practical information on the use, installation and checkout of the hardware set is presented in the following sections along with programming suggestions. Available related system software is described in this manual for reference and as an aid in understanding the system architecture. Section two presents an overview and operations manual of the SMC processor describing its composition and functional capabilities. Section three contains hardware descriptions in some detail for the LLL-designed hardware used in the SMC processor. Basic theory of operation and important features are explained
HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

Science.gov (United States)

van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten

2009-12-01

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the
Development of the network architecture of the Canadian MSAT system

Science.gov (United States)

Davies, N. George; Shoamanesh, Alireza; Leung, Victor C. M.

1988-05-01

A description is given of the present concept for the Canadian Mobile Satellite (MSAT) System and the development of the network architecture which will accommodate the planned family of three categories of service: a mobile radio service (MRS), a mobile telephone service (MTS), and a mobile data service (MDS). The MSAT satellite will have cross-strapped L-band and Ku-band transponders to provide communications services between L-band mobile terminals and fixed base stations supporting dispatcher-type MRS, gateway stations supporting MTS interconnections to the public telephone network, data hub stations supporting the MDS, and the network control center. The currently perceived centralized architecture with demand assignment multiple access for the circuit switched MRS, MTS and permanently assigned channels for the packet switched MDS is discussed.
Selection and integration of a network of parallel processors in the real time acquisition system of the 4π DIAMANT multidetector: modeling, realization and evaluation of the software installed on this network

International Nuclear Information System (INIS)

Guirande, F.

1997-01-01

The increase in sensitivity of 4π arrays such as EUROBALL or DIAMANT has led to an increase in the data flow rate into the data acquisition system. If at the electronic level, the data flow has been distributed onto several data acquisition buses, it is necessary in the data processing system to increase the processing power. This work regards the modelling and implementation of the software allocated onto an architecture of parallel processors. Object analysis and formal methods were used, benchmark and evolution in the future of this architecture are presented. The thesis consists of two parts. Part A, devoted to 'Nuclear Spectroscopy with 4 π multidetectors', contains a first chapter entitled 'The Physics of 4π multidetectors' and a second chapter entitled 'Integral architecture of 4π multidetectors'. Part B, devoted to 'Parallel acquisition system of DIAMANT' contains three chapters entitled 'Material architecture', 'Software architecture' and 'Validation and Performances'. Four appendices and a term glossary close this work. (author)
Area analysis of interconnection networks implemented on the honeycomb architecture

Energy Technology Data Exchange (ETDEWEB)

Milutinovic, D

1996-12-31

The are utilization of interconnection networks for parallel processing on one form of uniform parallel architecture of cellular type is analyzed. Formulae for the number of cells necessity to realize a networks and the efficiency factor of the system are derived. 15 refs.
Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

Science.gov (United States)

Graves, Alex; Schmidhuber, Jürgen

2005-01-01

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.
An Energy-Efficient and High-Quality Video Transmission Architecture in Wireless Video-Based Sensor Networks

Directory of Open Access Journals (Sweden)

Yasaman Samei

2008-08-01

Full Text Available Technological progress in the fields of Micro Electro-Mechanical Systems (MEMS and wireless communications and also the availability of CMOS cameras, microphones and small-scale array sensors, which may ubiquitously capture multimedia content from the field, have fostered the development of low-cost limited resources Wireless Video-based Sensor Networks (WVSN. With regards to the constraints of videobased sensor nodes and wireless sensor networks, a supporting video stream is not easy to implement with the present sensor network protocols. In this paper, a thorough architecture is presented for video transmission over WVSN called Energy-efficient and high-Quality Video transmission Architecture (EQV-Architecture. This architecture influences three layers of communication protocol stack and considers wireless video sensor nodes constraints like limited process and energy resources while video quality is preserved in the receiver side. Application, transport, and network layers are the layers in which the compression protocol, transport protocol, and routing protocol are proposed respectively, also a dropping scheme is presented in network layer. Simulation results over various environments with dissimilar conditions revealed the effectiveness of the architecture in improving the lifetime of the network as well as preserving the video quality.
An Energy-Efficient and High-Quality Video Transmission Architecture in Wireless Video-Based Sensor Networks.

Science.gov (United States)

Aghdasi, Hadi S; Abbaspour, Maghsoud; Moghadam, Mohsen Ebrahimi; Samei, Yasaman

2008-08-04

Technological progress in the fields of Micro Electro-Mechanical Systems (MEMS) and wireless communications and also the availability of CMOS cameras, microphones and small-scale array sensors, which may ubiquitously capture multimedia content from the field, have fostered the development of low-cost limited resources Wireless Video-based Sensor Networks (WVSN). With regards to the constraints of videobased sensor nodes and wireless sensor networks, a supporting video stream is not easy to implement with the present sensor network protocols. In this paper, a thorough architecture is presented for video transmission over WVSN called Energy-efficient and high-Quality Video transmission Architecture (EQV-Architecture). This architecture influences three layers of communication protocol stack and considers wireless video sensor nodes constraints like limited process and energy resources while video quality is preserved in the receiver side. Application, transport, and network layers are the layers in which the compression protocol, transport protocol, and routing protocol are proposed respectively, also a dropping scheme is presented in network layer. Simulation results over various environments with dissimilar conditions revealed the effectiveness of the architecture in improving the lifetime of the network as well as preserving the video quality.
A Formally Verified Decentralized Key Management Architecture for Wireless Sensor Networks

NARCIS (Netherlands)

Law, Y.W.; Corin, R.J.; Etalle, Sandro; Hartel, Pieter H.

We present a decentralized key management architecture for wireless sensor networks, covering the aspects of key deployment, key refreshment and key establishment. Our architecture is based on a clear set of assumptions and guidelines. Balance between security and energy consumption is achieved by

Modeling of a 3DTV service in the software-defined networking architecture

Science.gov (United States)

Wilczewski, Grzegorz

2014-11-01

In this article a newly developed concept towards modeling of a multimedia service offering stereoscopic motion imagery is presented. Proposed model is based on the approach of utilization of Software-defined Networking or Software Defined Networks architecture (SDN). The definition of 3D television service spanning SDN concept is identified, exposing basic characteristic of a 3DTV service in a modern networking organization layout. Furthermore, exemplary functionalities of the proposed 3DTV model are depicted. It is indicated that modeling of a 3DTV service in the Software-defined Networking architecture leads to multiplicity of improvements, especially towards flexibility of a service supporting heterogeneity of end user devices.
Energy-aware architecture for multi-rate ad hoc networks

Directory of Open Access Journals (Sweden)

Ahmed Yahya

2010-06-01

Full Text Available The backbone of ad hoc network design is energy performance and bandwidth resources limitations. Multi-rate adaptation architectures have been proposed to reduce the control overhead and to increase bandwidth utilization efficiency. In this paper, we propose a multi-rate protocol to provide the highest network performance under very low control overhead. The efficiency of the proposed auto multi-rate protocol is validated extensive simulations using QualNet network simulator. The simulation results demonstrate that our solution significantly improves the overall network performance.
Visual search, visual streams, and visual architectures.

Science.gov (United States)

Green, M

1991-10-01

Most psychological, physiological, and computational models of early vision suggest that retinal information is divided into a parallel set of feature modules. The dominant theories of visual search assume that these modules form a "blackboard" architecture: a set of independent representations that communicate only through a central processor. A review of research shows that blackboard-based theories, such as feature-integration theory, cannot easily explain the existing data. The experimental evidence is more consistent with a "network" architecture, which stresses that: (1) feature modules are directly connected to one another, (2) features and their locations are represented together, (3) feature detection and integration are not distinct processing stages, and (4) no executive control process, such as focal attention, is needed to integrate features. Attention is not a spotlight that synthesizes objects from raw features. Instead, it is better to conceptualize attention as an aperture which masks irrelevant visual information.
'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

International Nuclear Information System (INIS)

Vesztergombi, G.

1989-01-01

TRAX-I, a cost-effective parallel microcomputer, applying associative string processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this ''ionic'' representation are presented. (orig.)
'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

International Nuclear Information System (INIS)

Vestergombi, G.

1989-11-01

TRAX-I, a cost-effective parallel microcomputer, applying Associative String Processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this 'iconic' representation are presented. (orig.)
gFEX, the ATLAS Calorimeter Level-1 Real Time Processor

CERN Document Server

AUTHOR|(SzGeCERN)759889; The ATLAS collaboration; Begel, Michael; Chen, Hucheng; Lanni, Francesco; Takai, Helio; Wu, Weihao

2016-01-01

The global feature extractor (gFEX) is a component of the Level-1 Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Vertex Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 276 optical fibers with the data transferred at the 40 MHz Large Hadron Collider (LHC) clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor Field-Programmable Gate Array (FPGAs), monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA ...
gFEX, the ATLAS Calorimeter Level 1 Real Time Processor

CERN Document Server

Tang, Shaochun; The ATLAS collaboration

2015-01-01

The global feature extractor (gFEX) is a component of the Level-1Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 264 optical fibers with the data transferred at the 40 MHz LHC clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor FPGAs, monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA has been designed for testing and verification. The performance ...
Hierarchical Communication Network Architectures for Offshore Wind Power Farms

Directory of Open Access Journals (Sweden)

Mohamed A. Ahmed

2014-05-01

Full Text Available Nowadays, large-scale wind power farms (WPFs bring new challenges for both electric systems and communication networks. Communication networks are an essential part of WPFs because they provide real-time control and monitoring of wind turbines from a remote location (local control center. However, different wind turbine applications have different requirements in terms of data volume, latency, bandwidth, QoS, etc. This paper proposes a hierarchical communication network architecture that consist of a turbine area network (TAN, farm area network (FAN, and control area network (CAN for offshore WPFs. The two types of offshore WPFs studied are small-scale WPFs close to the grid and medium-scale WPFs far from the grid. The wind turbines are modelled based on the logical nodes (LN concepts of the IEC 61400-25 standard. To keep pace with current developments in wind turbine technology, the network design takes into account the extension of the LNs for both the wind turbine foundation and meteorological measurements. The proposed hierarchical communication network is based on Switched Ethernet. Servers at the control center are used to store and process the data received from the WPF. The network architecture is modelled and evaluated via OPNET. We investigated the end-to-end (ETE delay for different WPF applications. The results are validated by comparing the amount of generated sensing data with that of received traffic at servers. The network performance is evaluated, analyzed and discussed in view of end-to-end (ETE delay for different link bandwidths.
Unified Compact ECC-AES Co-Processor with Group-Key Support for IoT Devices in Wireless Sensor Networks

Directory of Open Access Journals (Sweden)

Luis Parrilla

2018-01-01

Full Text Available Security is a critical challenge for the effective expansion of all new emerging applications in the Internet of Things paradigm. Therefore, it is necessary to define and implement different mechanisms for guaranteeing security and privacy of data interchanged within the multiple wireless sensor networks being part of the Internet of Things. However, in this context, low power and low area are required, limiting the resources available for security and thus hindering the implementation of adequate security protocols. Group keys can save resources and communications bandwidth, but should be combined with public key cryptography to be really secure. In this paper, a compact and unified co-processor for enabling Elliptic Curve Cryptography along to Advanced Encryption Standard with low area requirements and Group-Key support is presented. The designed co-processor allows securing wireless sensor networks with independence of the communications protocols used. With an area occupancy of only 2101 LUTs over Spartan 6 devices from Xilinx, it requires 15% less area while achieving near 490% better performance when compared to cryptoprocessors with similar features in the literature.
Unified Compact ECC-AES Co-Processor with Group-Key Support for IoT Devices in Wireless Sensor Networks

Science.gov (United States)

Castillo, Encarnación; López-Ramos, Juan A.; Morales, Diego P.

2018-01-01

Security is a critical challenge for the effective expansion of all new emerging applications in the Internet of Things paradigm. Therefore, it is necessary to define and implement different mechanisms for guaranteeing security and privacy of data interchanged within the multiple wireless sensor networks being part of the Internet of Things. However, in this context, low power and low area are required, limiting the resources available for security and thus hindering the implementation of adequate security protocols. Group keys can save resources and communications bandwidth, but should be combined with public key cryptography to be really secure. In this paper, a compact and unified co-processor for enabling Elliptic Curve Cryptography along to Advanced Encryption Standard with low area requirements and Group-Key support is presented. The designed co-processor allows securing wireless sensor networks with independence of the communications protocols used. With an area occupancy of only 2101 LUTs over Spartan 6 devices from Xilinx, it requires 15% less area while achieving near 490% better performance when compared to cryptoprocessors with similar features in the literature. PMID:29337921
Unified Compact ECC-AES Co-Processor with Group-Key Support for IoT Devices in Wireless Sensor Networks.

Science.gov (United States)

Parrilla, Luis; Castillo, Encarnación; López-Ramos, Juan A; Álvarez-Bermejo, José A; García, Antonio; Morales, Diego P

2018-01-16

Security is a critical challenge for the effective expansion of all new emerging applications in the Internet of Things paradigm. Therefore, it is necessary to define and implement different mechanisms for guaranteeing security and privacy of data interchanged within the multiple wireless sensor networks being part of the Internet of Things. However, in this context, low power and low area are required, limiting the resources available for security and thus hindering the implementation of adequate security protocols. Group keys can save resources and communications bandwidth, but should be combined with public key cryptography to be really secure. In this paper, a compact and unified co-processor for enabling Elliptic Curve Cryptography along to Advanced Encryption Standard with low area requirements and Group-Key support is presented. The designed co-processor allows securing wireless sensor networks with independence of the communications protocols used. With an area occupancy of only 2101 LUTs over Spartan 6 devices from Xilinx, it requires 15% less area while achieving near 490% better performance when compared to cryptoprocessors with similar features in the literature.
Reconfiguration of Brain Network Architectures between Resting-State and Complexity-Dependent Cognitive Reasoning.

Science.gov (United States)

Hearne, Luke J; Cocchi, Luca; Zalesky, Andrew; Mattingley, Jason B

2017-08-30

Our capacity for higher cognitive reasoning has a measurable limit. This limit is thought to arise from the brain's capacity to flexibly reconfigure interactions between spatially distributed networks. Recent work, however, has suggested that reconfigurations of task-related networks are modest when compared with intrinsic "resting-state" network architecture. Here we combined resting-state and task-driven functional magnetic resonance imaging to examine how flexible, task-specific reconfigurations associated with increasing reasoning demands are integrated within a stable intrinsic brain topology. Human participants (21 males and 28 females) underwent an initial resting-state scan, followed by a cognitive reasoning task involving different levels of complexity, followed by a second resting-state scan. The reasoning task required participants to deduce the identity of a missing element in a 4 × 4 matrix, and item difficulty was scaled parametrically as determined by relational complexity theory. Analyses revealed that external task engagement was characterized by a significant change in functional brain modules. Specifically, resting-state and null-task demand conditions were associated with more segregated brain-network topology, whereas increases in reasoning complexity resulted in merging of resting-state modules. Further increments in task complexity did not change the established modular architecture, but affected selective patterns of connectivity between frontoparietal, subcortical, cingulo-opercular, and default-mode networks. Larger increases in network efficiency within the newly established task modules were associated with higher reasoning accuracy. Our results shed light on the network architectures that underlie external task engagement, and highlight selective changes in brain connectivity supporting increases in task complexity. SIGNIFICANCE STATEMENT Humans have clear limits in their ability to solve complex reasoning problems. It is thought that
SAFETY ON UNTRUSTED NETWORK DEVICES (SOUND)

Science.gov (United States)

2017-10-10

and LSD-41 labs to show how it can work at scale to protect a ship network. 15. SUBJECT TERMS Communities of trust, SAFE architecture, adaptable... environment . Then, SOUND development would extend the SAFE implementation from the CRASH program to allow SAFE hosts to operate in a heterogeneous...hardware level on a SAFE processor (developed under the DARPA CRASH program). This section summarizes our work ; more details can be found in [K+14
Use of communication architecture test bed to evaluate data network performance

International Nuclear Information System (INIS)

Clapp, N.E. Jr.; Swail, B.K.; Naser, J.A.

1994-01-01

Local area networks (LANs) are becoming more prevalent in nuclear power plants. Traditionally, LANs were only used as information highways, providing office automation services. LANs are now being used as data highways for applications in plant data acquisition and control systems. A communication architecture test bed, which contains network simulators, is needed to allow network performance studies and to resolve design issues prior to equipment purchase. Two levels of granularity of simulation are needed to provide the dynamic information about network performance. A coarse-grain simulator is used to estimate the dynamic performance of the network due to major resources such as workstations, gateways, and data acquisition systems. A fine-grain simulator allows a greater level of detail about the underlying network protocol and resources to be simulated. The combination of coarse-grain and fine-grain simulation packages provides the network designer with the required tools to thoroughly understand the behavior of the modeled network. This paper describes the development of a communication architecture test bed using commercial network simulation packages. Network simulators allow the resolution of major design issues in software without the expense of purchasing costly hardware components
Research on two-port network of wavelet transform processor using surface acoustic wavelet devices and its application.

Science.gov (United States)

Liu, Shoubing; Lu, Wenke; Zhu, Changchun

2017-11-01

The goal of this research is to study two-port network of wavelet transform processor (WTP) using surface acoustic wave (SAW) devices and its application. The motive was prompted by the inconvenience of the long research and design cycle and the huge research funding involved with traditional method in this field, which were caused by the lack of the simulation and emulation method of WTP using SAW devices. For this reason, we introduce the two-port network analysis tool, which has been widely used in the design and analysis of SAW devices with uniform interdigital transducers (IDTs). Because the admittance parameters calculation formula of the two-port network can only be used for the SAW devices with uniform IDTs, this analysis tool cannot be directly applied into the design and analysis of the processor using SAW devices, whose input interdigital transducer (IDT) is apodized weighting. Therefore, in this paper, we propose the channel segmentation method, which can convert the WTP using SAW devices into parallel channels, and also provide with the calculation formula of the number of channels, the number of finger pairs and the static capacitance of an interdigital period in each parallel channel firstly. From the parameters given above, we can calculate the admittance parameters of the two port network for each channel, so that we can obtain the admittance parameter of the two-port network of the WTP using SAW devices on the basis of the simplification rule of parallel two-port network. Through this analysis tool, not only can we get the impulse response function of the WTP using SAW devices but we can also get the matching circuit of it. Large numbers of studies show that the parameters of the two-port network obtained by this paper are consistent with those measured by network analyzer E5061A, and the impulse response function obtained by the two-port network analysis tool is also consistent with that measured by network analyzer E5061A, which can meet the
The hardware track finder processor in CMS at CERN

CERN Document Server

Kluge, A

1997-01-01

The work covers the design of the Track Finder Processor in the high energy experiment CMS (Compact Muon Solenoid, planned for 2005) at CERN/Geneva. The task of this processor is to identify muons and measure their transverse momentum. The track finder processor makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data an alysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and measure their transverse momentum. Each 25 ns a new data set is generated. Measurem ent of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC (Application Specific Integrated Circuit). A pipeline architecture processes new data sets with th e required data rate of 40 MHz to ensure dead time free operation. In the framework of this study specifications and the overall concept of the track finder processor were worked out in detail. Simul ations were performed...
A network architecture supporting consistent rich behavior in collaborative interactive applications.

Science.gov (United States)

Marsh, James; Glencross, Mashhuda; Pettifer, Steve; Hubbold, Roger

2006-01-01

Network architectures for collaborative virtual reality have traditionally been dominated by client-server and peer-to-peer approaches, with peer-to-peer strategies typically being favored where minimizing latency is a priority, and client-server where consistency is key. With increasingly sophisticated behavior models and the demand for better support for haptics, we argue that neither approach provides sufficient support for these scenarios and, thus, a hybrid architecture is required. We discuss the relative performance of different distribution strategies in the face of real network conditions and illustrate the problems they face. Finally, we present an architecture that successfully meets many of these challenges and demonstrate its use in a distributed virtual prototyping application which supports simultaneous collaboration for assembly, maintenance, and training applications utilizing haptics.
3D-Flow processor for a programmable Level-1 trigger (feasibility study)

International Nuclear Information System (INIS)

Crosetto, D.

1992-10-01

A feasibility study has been made to use the 3D-Flow processor in a pipelined programmable parallel processing architecture to identify particles such as electrons, jets, muons, etc., in high-energy physics experiments
Initial explorations of ARM processors for scientific computing

International Nuclear Information System (INIS)

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Muzaffar, Shahzad

2014-01-01

Power efficiency is becoming an ever more important metric for both high performance and high throughput computing. Over the course of next decade it is expected that flops/watt will be a major driver for the evolution of computer architecture. Servers with large numbers of ARM processors, already ubiquitous in mobile computing, are a promising alternative to traditional x86-64 computing. We present the results of our initial investigations into the use of ARM processors for scientific computing applications. In particular we report the results from our work with a current generation ARMv7 development board to explore ARM-specific issues regarding the software development environment, operating system, performance benchmarks and issues for porting High Energy Physics software
Optimum Neural Network Architecture for Precipitation Prediction of Myanmar

OpenAIRE

Khaing Win Mar; Thinn Thu Naing

2008-01-01

Nowadays, precipitation prediction is required for proper planning and management of water resources. Prediction with neural network models has received increasing interest in various research and application domains. However, it is difficult to determine the best neural network architecture for prediction since it is not immediately obvious how many input or hidden nodes are used in the model. In this paper, neural network model is used as a forecasting tool. The major aim is to evaluate a s...

Building and measuring a high performance network architecture

Energy Technology Data Exchange (ETDEWEB)

Kramer, William T.C.; Toole, Timothy; Fisher, Chuck; Dugan, Jon; Wheeler, David; Wing, William R; Nickless, William; Goddard, Gregory; Corbato, Steven; Love, E. Paul; Daspit, Paul; Edwards, Hal; Mercer, Linden; Koester, David; Decina, Basil; Dart, Eli; Paul Reisinger, Paul; Kurihara, Riki; Zekauskas, Matthew J; Plesset, Eric; Wulf, Julie; Luce, Douglas; Rogers, James; Duncan, Rex; Mauth, Jeffery

2001-04-20

Once a year, the SC conferences present a unique opportunity to create and build one of the most complex and highest performance networks in the world. At SC2000, large-scale and complex local and wide area networking connections were demonstrated, including large-scale distributed applications running on different architectures. This project was designed to use the unique opportunity presented at SC2000 to create a testbed network environment and then use that network to demonstrate and evaluate high performance computational and communication applications. This testbed was designed to incorporate many interoperable systems and services and was designed for measurement from the very beginning. The end results were key insights into how to use novel, high performance networking technologies and to accumulate measurements that will give insights into the networks of the future.
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

Science.gov (United States)

Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

2004-10-01

Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
A discussion of tools and techniques for distributed processor based control systems using CAMAC

International Nuclear Information System (INIS)

Tippie, J.W.; Scandora, A.E.

1985-01-01

This paper describes and analyzes various distributed processor architectures using commercially available CAMAC components. The general orientation is toward distributed control systems using Digital Equipment Corporation LSI11 processors in a CAMAC environment. The paper describes in detail software tools available to simplify the development of applications software and to provide a high-level runtime environment both at the host and the remote processors. Discussion focuses on techniques for downloading of operating systems from a large host and applications tasks written in high-level languages. It also discusses software tools which enable tasks in the remote processors to exchange messages and data with tasks in the host in a simple and elegant way
Security Policy for a Generic Space Exploration Communication Network Architecture

Science.gov (United States)

Ivancic, William D.; Sheehe, Charles J.; Vaden, Karl R.

2016-01-01

This document is one of three. It describes various security mechanisms and a security policy profile for a generic space-based communication architecture. Two other documents accompany this document- an Operations Concept (OpsCon) and a communication architecture document. The OpsCon should be read first followed by the security policy profile described by this document and then the architecture document. The overall goal is to design a generic space exploration communication network architecture that is affordable, deployable, maintainable, securable, evolvable, reliable, and adaptable. The architecture should also require limited reconfiguration throughout system development and deployment. System deployment includes subsystem development in a factory setting, system integration in a laboratory setting, launch preparation, launch, and deployment and operation in space.
Software architecture for hybrid electrical/optical data center network

DEFF Research Database (Denmark)

Mehmeri, Victor; Vegas Olmos, Juan José; Tafur Monroy, Idelfonso

2016-01-01

This paper presents hardware and software architecture based on Software-Defined Networking (SDN) paradigm and OpenFlow/NETCONF protocols for enabling topology management of hybrid electrical/optical switching data center networks. In particular, a development on top of SDN open-source controller...... OpenDaylight is presented to control an optical switching matrix based on Micro-Electro-Mechanical System (MEMS) technology....
Hubs of Anticorrelation in High-Resolution Resting-State Functional Connectivity Network Architecture.

Science.gov (United States)

Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Cabanban, Romeo; Crosson, Bruce A

2015-06-01

A major focus of brain research recently has been to map the resting-state functional connectivity (rsFC) network architecture of the normal brain and pathology through functional magnetic resonance imaging. However, the phenomenon of anticorrelations in resting-state signals between different brain regions has not been adequately examined. The preponderance of studies on resting-state fMRI (rsFMRI) have either ignored anticorrelations in rsFC networks or adopted methods in data analysis, which have rendered anticorrelations in rsFC networks uninterpretable. The few studies that have examined anticorrelations in rsFC networks using conventional methods have found anticorrelations to be weak in strength and not very reproducible across subjects. Anticorrelations in rsFC network architecture could reflect mechanisms that subserve a number of important brain processes. In this preliminary study, we examined the properties of anticorrelated rsFC networks by systematically focusing on negative cross-correlation coefficients (CCs) among rsFMRI voxel time series across the brain with graph theory-based network analysis. A number of methods were implemented to enhance the neuronal specificity of resting-state functional connections that yield negative CCs, although at the cost of decreased sensitivity. Hubs of anticorrelation were seen in a number of cortical and subcortical brain regions. Examination of the anticorrelation maps of these hubs indicated that negative CCs in rsFC network architecture highlight a number of regulatory interactions between brain networks and regions, including reciprocal modulations, suppression, inhibition, and neurofeedback.
A Software Implementation of a Satellite Interface Message Processor.

Science.gov (United States)

Eastwood, Margaret A.; Eastwood, Lester F., Jr.

A design for network control software for a computer network is described in which some nodes are linked by a communications satellite channel. It is assumed that the network has an ARPANET-like configuration; that is, that specialized processors at each node are responsible for message switching and network control. The purpose of the control…
Design of an ultra-low-power digital processor for passive UHF RFID tags

Energy Technology Data Exchange (ETDEWEB)

Shi Wanggen; Zhuang Yiqi; Li Xiaoming; Wang Xianghua; Jin Zhao; Wang Dan, E-mail: wanggen_shi@163.co [Key Laboratory of the Ministry of Education for Wide Band-Gap Semiconductor Materials and Devices, Institute of Microelectronics, Xidian University, Xi' an 710071 (China)

2009-04-15

A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed. This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption. By applying methods like system-level power management, global clock gating and low voltage implementation, the total power of the design is reduced to a few microwatts. In addition, an innovative way for the design of a true RNG is presented, which contributes to both low power and secure data transaction. The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows. The design fits different CMOS technologies and has been taped out using the 2P4M 0.35 mum process of Chartered Semiconductor.
Distributed computing methodology for training neural networks in an image-guided diagnostic application.

Science.gov (United States)

Plagianakos, V P; Magoulas, G D; Vrahatis, M N

2006-03-01

Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used.
Parallel Processor for 3D Recovery from Optical Flow

Directory of Open Access Journals (Sweden)

Jose Hugo Barron-Zambrano

2009-01-01

Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Softwarization of Mobile Network Functions towards Agile and Energy Efficient 5G Architectures: A Survey

Directory of Open Access Journals (Sweden)

Dlamini Thembelihle

2017-01-01

Full Text Available Future mobile networks (MNs are required to be flexible with minimal infrastructure complexity, unlike current ones that rely on proprietary network elements to offer their services. Moreover, they are expected to make use of renewable energy to decrease their carbon footprint and of virtualization technologies for improved adaptability and flexibility, thus resulting in green and self-organized systems. In this article, we discuss the application of software defined networking (SDN and network function virtualization (NFV technologies towards softwarization of the mobile network functions, taking into account different architectural proposals. In addition, we elaborate on whether mobile edge computing (MEC, a new architectural concept that uses NFV techniques, can enhance communication in 5G cellular networks, reducing latency due to its proximity deployment. Besides discussing existing techniques, expounding their pros and cons and comparing state-of-the-art architectural proposals, we examine the role of machine learning and data mining tools, analyzing their use within fully SDN- and NFV-enabled mobile systems. Finally, we outline the challenges and the open issues related to evolved packet core (EPC and MEC architectures.
The language parallel Pascal and other aspects of the massively parallel processor

Science.gov (United States)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Design Methodology of a Sensor Network Architecture Supporting Urgent Information and Its Evaluation

Science.gov (United States)

Kawai, Tetsuya; Wakamiya, Naoki; Murata, Masayuki

Wireless sensor networks are expected to become an important social infrastructure which helps our life to be safe, secure, and comfortable. In this paper, we propose design methodology of an architecture for fast and reliable transmission of urgent information in wireless sensor networks. In this methodology, instead of establishing single complicated monolithic mechanism, several simple and fully-distributed control mechanisms which function in different spatial and temporal levels are incorporated on each node. These mechanisms work autonomously and independently responding to the surrounding situation. We also show an example of a network architecture designed following the methodology. We evaluated the performance of the architecture by extensive simulation and practical experiments and our claim was supported by the results of these experiments.
The Square Kilometre Array Science Data Processor. Preliminary compute platform design

International Nuclear Information System (INIS)

Broekema, P.C.; Nieuwpoort, R.V. van; Bal, H.E.

2015-01-01

The Square Kilometre Array is a next-generation radio-telescope, to be built in South Africa and Western Australia. It is currently in its detailed design phase, with procurement and construction scheduled to start in 2017. The SKA Science Data Processor is the high-performance computing element of the instrument, responsible for producing science-ready data. This is a major IT project, with the Science Data Processor expected to challenge the computing state-of-the art even in 2020. In this paper we introduce the preliminary Science Data Processor design and the principles that guide the design process, as well as the constraints to the design. We introduce a highly scalable and flexible system architecture capable of handling the SDP workload
Computer architecture fundamentals and principles of computer design

CERN Document Server

Dumas II, Joseph D

2005-01-01

Introduction to Computer ArchitectureWhat is Computer Architecture?Architecture vs. ImplementationBrief History of Computer SystemsThe First GenerationThe Second GenerationThe Third GenerationThe Fourth GenerationModern Computers - The Fifth GenerationTypes of Computer SystemsSingle Processor SystemsParallel Processing SystemsSpecial ArchitecturesQuality of Computer SystemsGenerality and ApplicabilityEase of UseExpandabilityCompatibilityReliabilitySuccess and Failure of Computer Architectures and ImplementationsQuality and the Perception of QualityCost IssuesArchitectural Openness, Market Timi
Navigation Architecture for a Space Mobile Network

Science.gov (United States)

Valdez, Jennifer E.; Ashman, Benjamin; Gramling, Cheryl; Heckler, Gregory W.; Carpenter, Russell

2016-01-01

The Tracking and Data Relay Satellite System (TDRSS) Augmentation Service for Satellites (TASS) is a proposed beacon service to provide a global, space based GPS augmentation service based on the NASA Global Differential GPS (GDGPS) System. The TASS signal will be tied to the GPS time system and usable as an additional ranging and Doppler radiometric source. Additionally, it will provide data vital to autonomous navigation in the near Earth regime, including space weather information, TDRS ephemerides, Earth Orientation Parameters (EOP), and forward commanding capability. TASS benefits include enhancing situational awareness, enabling increased autonomy, and providing near real-time command access for user platforms. As NASA Headquarters' Space Communication and Navigation Office (SCaN) begins to move away from a centralized network architecture and towards a Space Mobile Network (SMN) that allows for user initiated services, autonomous navigation will be a key part of such a system. This paper explores how a TASS beacon service enables the Space Mobile Networking paradigm, what a typical user platform would require, and provides an in-depth analysis of several navigation scenarios and operations concepts. This paper provides an overview of the TASS beacon and its role within the SMN and user community. Supporting navigation analysis is presented for two user mission scenarios: an Earth observing spacecraft in low earth orbit (LEO), and a highly elliptical spacecraft in a lunar resonance orbit. These diverse flight scenarios indicate the breadth of applicability of the TASS beacon for upcoming users within the current network architecture and in the SMN.
Network Architecture: lessons from the past, vision for the future

CERN Multimedia

CERN. Geneva

2004-01-01

The Architectural Principles of the Internet have dominated the past decade. Orthogonal to the telecommunications industry principles, they dramatically changed the networking landscape because they relied on iconoclastic ideas. First, the Internet end-to-end principle, which stipulates that the network should intervene minimally on the end-to-end traffic, pushing the complexity to the end-systems. Second, the ban of centralized functions: all the Internet techniques (routing, DNS, management) are based on distributed, decentralized mechanisms. Third, the absolute domination of connectionless (stateless) protocols (as with IP, HTTTP). However, when facing new requirements: multimedia traffic, security, Grid applications, these principles appear sometimes as architectural barriers. Multimedia requires QoS guarantees, but stateless systems are not good at QoS. Security requires active, intelligent networks, but dumb routers or plain end-to-end mail systems are insufficient. Grid applications require...
Quantum chemistry on a superconducting quantum processor

Energy Technology Data Exchange (ETDEWEB)

Kaicher, Michael P.; Wilhelm, Frank K. [Theoretical Physics, Saarland University, 66123 Saarbruecken (Germany); Love, Peter J. [Department of Physics and Astronomy, Tufts University, Medford, MA 02155 (United States)

2016-07-01

Quantum chemistry is the most promising civilian application for quantum processors to date. We study its adaptation to superconducting (sc) quantum systems, computing the ground state energy of LiH through a variational hybrid quantum classical algorithm. We demonstrate how interactions native to sc qubits further reduce the amount of quantum resources needed, pushing sc architectures as a near-term candidate for simulations of more complex atoms/molecules.
Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

NARCIS (Netherlands)

Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

2013-01-01

Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly
Ensemble Network Architecture for Deep Reinforcement Learning

Directory of Open Access Journals (Sweden)

Xi-liang Chen

2018-01-01

Full Text Available The popular deep Q learning algorithm is known to be instability because of the Q-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.

A high-speed analog neural processor

NARCIS (Netherlands)

Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans

1994-01-01

Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight
Modular Neural Tile Architecture for Compact Embedded Hardware Spiking Neural Network

NARCIS (Netherlands)

Pande, Sandeep; Morgan, Fearghal; Cawley, Seamus; Bruintjes, Tom; Smit, Gerardus Johannes Maria; McGinley, Brian; Carrillo, Snaider; Harkin, Jim; McDaid, Liam

2013-01-01

Biologically-inspired packet switched network on chip (NoC) based hardware spiking neural network (SNN) architectures have been proposed as an embedded computing platform for classification, estimation and control applications. Storage of large synaptic connectivity (SNN topology) information in
Evaluation of existing and proposed computer architectures for future ground-based systems

Science.gov (United States)

Schulbach, C.

1985-01-01

Parallel processing architectures and techniques used in current supercomputers are described and projections are made of future advances. Presently, the von Neumann sequential processing pattern has been accelerated by having separate I/O processors, interleaved memories, wide memories, independent functional units and pipelining. Recent supercomputers have featured single-input, multiple data stream architectures, which have different processors for performing various operations (vector or pipeline processors). Multiple input, multiple data stream machines have also been developed. Data flow techniques, wherein program instructions are activated only when data are available, are expected to play a large role in future supercomputers, along with increased parallel processor arrays. The enhanced operational speeds are essential for adequately treating data from future spacecraft remote sensing instruments such as the Thematic Mapper.
Class network routing

Science.gov (United States)

Bhanot, Gyan [Princeton, NJ; Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-09-08

Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
SANDS: an architecture for clinical decision support in a National Health Information Network.

Science.gov (United States)

Wright, Adam; Sittig, Dean F

2007-10-11

A new architecture for clinical decision support called SANDS (Service-oriented Architecture for NHIN Decision Support) is introduced and its performance evaluated. The architecture provides a method for performing clinical decision support across a network, as in a health information exchange. Using the prototype we demonstrated that, first, a number of useful types of decision support can be carried out using our architecture; and, second, that the architecture exhibits desirable reliability and performance characteristics.
Computing on Knights and Kepler Architectures

International Nuclear Information System (INIS)

Bortolotti, G; Caberletti, M; Ferraro, A; Giacomini, F; Manzali, M; Maron, G; Salomoni, D; Crimi, G; Zanella, M

2014-01-01

A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.
A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems

Directory of Open Access Journals (Sweden)

Hiroki Iwaizumi

2013-01-01

Full Text Available A processor design for singular value decomposition (SVD and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs and arithmetic instructions specialized in complex matrix operations.
A Real-Time Sound Field Rendering Processor

Directory of Open Access Journals (Sweden)

Tan Yiyu

2017-12-01

Full Text Available Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time at each time step. In this work, a processor with a hybrid architecture is proposed to speed up computation and improve the sample rate of the output sound, and an interface is developed for system scalability through simply cascading many chips to enlarge the simulated area. To render a three-minute Beethoven wave sound in a small shoe-box room with dimensions of 1.28 m × 1.28 m × 0.64 m, the field programming gate array (FPGA-based prototype machine with the proposed architecture carries out the sound rendering at run-time while the software simulation with the OpenMP parallelization takes about 12.70 min on a personal computer (PC with 32 GB random access memory (RAM and an Intel i7-6800K six-core processor running at 3.4 GHz. The throughput in the software simulation is about 194 M grids/s while it is 51.2 G grids/s in the prototype machine even if the clock frequency of the prototype machine is much lower than that of the PC. The rendering processor with a processing element (PE and interfaces consumes about 238,515 gates after fabricated by the 0.18 µm processing technology from the ROHM semiconductor Co., Ltd. (Kyoto Japan, and the power consumption is about 143.8 mW.
Design mobile satellite system architecture as an integral part of the cellular access digital network

Science.gov (United States)

Chien, E. S. K.; Marinho, J. A.; Russell, J. E., Sr.

1988-01-01

The Cellular Access Digital Network (CADN) is the access vehicle through which cellular technology is brought into the mainstream of the evolving integrated telecommunications network. Beyond the integrated end-to-end digital access and per call network services provisioning of the Integrated Services Digital Network (ISDN), the CADN engenders the added capability of mobility freedom via wireless access. One key element of the CADN network architecture is the standard user to network interface that is independent of RF transmission technology. Since the Mobile Satellite System (MSS) is envisioned to not only complement but also enhance the capabilities of the terrestrial cellular telecommunications network, compatibility and interoperability between terrestrial cellular and mobile satellite systems are vitally important to provide an integrated moving telecommunications network of the future. From a network standpoint, there exist very strong commonalities between the terrestrial cellular system and the mobile satellite system. Therefore, the MSS architecture should be designed as an integral part of the CADN. This paper describes the concept of the CADN, the functional architecture of the MSS, and the user-network interface signaling protocols.
Analysis of the computational requirements of a pulse-doppler radar signal processor

CSIR Research Space (South Africa)

Broich, R

2012-05-01

Full Text Available In an attempt to find an optimal processing architecture for radar signal processing applications, the different algorithms that are typically used in a pulse-Doppler radar signal processor are investigated. Radar algorithms are broken down...
Efficient Programming for Multicore Processor Heterogeneity: OpenMP versus OmpSs

OpenAIRE

Butko , Anastasiia; Bruguier , Florent; Gamatié , Abdoulaye; Sassatelli , Gilles

2017-01-01

International audience; ARM single-ISA heterogeneous multicore processors combine high-performance big cores with power-efficient small cores. They aim at achieving a suitable balance between performance and energy. How- ever, a main challenge is to program such architectures so as to efficiently exploit their features. In this paper, we study the impact on performance and energy trade-offs of single-ISA architecture according to OpenMP 3.0 and the OmpSs programming models. We consider differ...
Survivable architectures for time and wavelength division multiplexed passive optical networks

Science.gov (United States)

Wong, Elaine

2014-08-01

The increased network reach and customer base of next-generation time and wavelength division multiplexed PON (TWDM-PONs) have necessitated rapid fault detection and subsequent restoration of services to its users. However, direct application of existing solutions for conventional PONs to TWDM-PONs is unsuitable as these schemes rely on the loss of signal (LOS) of upstream transmissions to trigger protection switching. As TWDM-PONs are required to potentially use sleep/doze mode optical network units (ONU), the loss of upstream transmission from a sleeping or dozing ONU could erroneously trigger protection switching. Further, TWDM-PONs require its monitoring modules for fiber/device fault detection to be more sensitive than those typically deployed in conventional PONs. To address the above issues, three survivable architectures that are compliant with TWDM-PON specifications are presented in this work. These architectures combine rapid detection and protection switching against multipoint failure, and most importantly do not rely on upstream transmissions for LOS activation. Survivability analyses as well as evaluations of the additional costs incurred to achieve survivability are performed and compared to the unprotected TWDM-PON. Network parameters that impact the maximum achievable network reach, maximum split ratio, connection availability, fault impact, and the incremental reliability costs for each proposed survivable architecture are highlighted.
Applying the roofline performance model to the intel xeon phi knights landing processor

OpenAIRE

Doerfler, D; Deslippe, J; Williams, S; Oliker, L; Cook, B; Kurth, T; Lobet, M; Malas, T; Vay, JL; Vincenti, H

2016-01-01

ï¿½ Springer International Publishing AG 2016. The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining t...
ATLANTIDES: An Architecture for Alert Verification in Network Intrusion Detection Systems

NARCIS (Netherlands)

Bolzoni, D.; Crispo, Bruno; Etalle, Sandro

2007-01-01

We present an architecture designed for alert verification (i.e., to reduce false positives) in network intrusion-detection systems. Our technique is based on a systematic (and automatic) anomaly-based analysis of the system output, which provides useful context information regarding the network
Architecture design of the multi-functional wavelet-based ECG microprocessor for realtime detection of abnormal cardiac events.

Science.gov (United States)

Cheng, Li-Fang; Chen, Tung-Chien; Chen, Liang-Gee

2012-01-01

Most of the abnormal cardiac events such as myocardial ischemia, acute myocardial infarction (AMI) and fatal arrhythmia can be diagnosed through continuous electrocardiogram (ECG) analysis. According to recent clinical research, early detection and alarming of such cardiac events can reduce the time delay to the hospital, and the clinical outcomes of these individuals can be greatly improved. Therefore, it would be helpful if there is a long-term ECG monitoring system with the ability to identify abnormal cardiac events and provide realtime warning for the users. The combination of the wireless body area sensor network (BASN) and the on-sensor ECG processor is a possible solution for this application. In this paper, we aim to design and implement a digital signal processor that is suitable for continuous ECG monitoring and alarming based on the continuous wavelet transform (CWT) through the proposed architectures--using both programmable RISC processor and application specific integrated circuits (ASIC) for performance optimization. According to the implementation results, the power consumption of the proposed processor integrated with an ASIC for CWT computation is only 79.4 mW. Compared with the single-RISC processor, about 91.6% of the power reduction is achieved.
Design of an ultra-low-power digital processor for passive UHF RFID tags

International Nuclear Information System (INIS)

Shi Wanggen; Zhuang Yiqi; Li Xiaoming; Wang Xianghua; Jin Zhao; Wang Dan

2009-01-01

A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed. This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption. By applying methods like system-level power management, global clock gating and low voltage implementation, the total power of the design is reduced to a few microwatts. In addition, an innovative way for the design of a true RNG is presented, which contributes to both low power and secure data transaction. The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows. The design fits different CMOS technologies and has been taped out using the 2P4M 0.35 μm process of Chartered Semiconductor.
Utilizing a multiprocessor architecture - The performance of MIDAS

International Nuclear Information System (INIS)

Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

1983-01-01

The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements
T-CREST: Time-predictable multi-core architecture for embedded systems

DEFF Research Database (Denmark)

Schoeberl, Martin; Abbaspourseyedi, Sahar; Jordan, Alexander

2015-01-01

-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared...... domain shows that the WCET can be reduced for computation-intensive tasks when distributing the tasks on several cores and using the network-on-chip for communication. With three cores the WCET is improved by a factor of 1.8 and with 15 cores by a factor of 5.7.The T-CREST project is the result...
SYS6: Tenet: An Architecture for Tiered Embedded Networks

OpenAIRE

Krishna Chintalapudi; Deborah Estrin; Om Gnawali; Ramesh Govindan; Eddie Kohler; Jeong Paek; Sumit Rangwala; Thanos Sthathopoulos

2005-01-01

Over the last five years, sensor network research has seen significant advances in the development of hardware devices and platforms, and in the design of services and infrastructural elements such as routing, localization, and time synchronization. Deployed systems, however, have lagged behind. In this poster, we will describe an alternative architecture, called Tenet, for sensor networks that constrains placement of application-specific functionality on relatively unconstrained nodes. We w...
Processor tradeoffs in distributed real-time systems

Science.gov (United States)

Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

1987-01-01

The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.

Architecture, design and protection of electrical distribution networks

Energy Technology Data Exchange (ETDEWEB)

Sorrel, J.P. [Schneider electric Industries SA (France)

2000-07-01

Architectures related to AII Electric Ship (AES) require high level of propulsion power. Merchant ships and obviously warships require a low vulnerability, a high reliability and availability, a simple maintainability as well as an ordinary ode of operation. These constraints converge to an optimum single line diagram. We will focus on the mode of operation of the network, its constraints, the facilities to use a ring distribution for the ship service distribution system, the earthing of HV network as well as future developments. (author)
A Smart Gateway Architecture for Improving Efficiency of Home Network Applications

OpenAIRE

Ding, Fei; Song, Aiguo; Tong, En; Li, Jianqing

2016-01-01

A smart home gateway plays an important role in the Internet of Things (IoT) system that takes responsibility for the connection between the network layer and the ubiquitous sensor network (USN) layer. Even though the home network application is developing rapidly, researches on the home gateway based open development architecture are less. This makes it difficult to extend the home network to support new applications, share service, and interoperate with other home network systems. An integr...
Towards a Systematic Exploration of the Optimization Space for Many-Core Processors

NARCIS (Netherlands)

Fang, J.

2014-01-01

The architecture diversity of many-core processors - with their different types of cores, and memory hierarchies - makes the old model of reprogramming every application for every platform infeasible. Therefore, inter-platform portability has become a desirable feature of programming models. While
A Time-Composable Operating System for the Patmos Processor

DEFF Research Database (Denmark)

Ziccardi, Marco; Schoeberl, Martin; Vardanega, Tullio

2015-01-01

-composable operating system, on top of a time-composable processor, facilitates incremental development, which is highly desirable for industry. This paper makes a twofold contribution. First, we present enhancements to the Patmos processor to allow achieving time composability at the operating system level. Second......, we extend an existing time-composable operating system, TiCOS, to make best use of advanced Patmos hardware features in the pursuit of time composability.......In the last couple of decades we have witnessed a steady growth in the complexity and widespread of real-time systems. In order to master the rising complexity in the timing behaviour of those systems, rightful attention has been given to the development of time-predictable computer architectures...
Time Shared Optical Network (TSON): a novel metro architecture for flexible multi-granular services.

Science.gov (United States)

Zervas, Georgios S; Triay, Joan; Amaya, Norberto; Qin, Yixuan; Cervelló-Pastor, Cristina; Simeonidou, Dimitra

2011-12-12

This paper presents the Time Shared Optical Network (TSON) as metro mesh network architecture for guaranteed, statistically-multiplexed services. TSON proposes a flexible and tunable time-wavelength assignment along with one-way tree-based reservation and node architecture. It delivers guaranteed sub-wavelength and multi-granular network services without wavelength conversion, time-slice interchange and optical buffering. Simulation results demonstrate high network utilization, fast service delivery, and low end-to-end delay on a contention-free sub-wavelength optical transport network. In addition, implementation complexity in terms of Layer 2 aggregation, grooming and optical switching has been evaluated. © 2011 Optical Society of America
Firewall Architectures for High-Speed Networks: Final Report

Energy Technology Data Exchange (ETDEWEB)

Errin W. Fulp

2007-08-20

Firewalls are a key component for securing networks that are vital to government agencies and private industry. They enforce a security policy by inspecting and filtering traffic arriving or departing from a secure network. While performing these critical security operations, firewalls must act transparent to legitimate users, with little or no effect on the perceived network performance (QoS). Packets must be inspected and compared against increasingly complex rule sets and tables, which is a time-consuming process. As a result, current firewall systems can introduce significant delays and are unable to maintain QoS guarantees. Furthermore, firewalls are susceptible to Denial of Service (DoS) attacks that merely overload/saturate the firewall with illegitimate traffic. Current firewall technology only offers a short-term solution that is not scalable; therefore, the \\textbf{objective of this DOE project was to develop new firewall optimization techniques and architectures} that meet these important challenges. Firewall optimization concerns decreasing the number of comparisons required per packet, which reduces processing time and delay. This is done by reorganizing policy rules via special sorting techniques that maintain the original policy integrity. This research is important since it applies to current and future firewall systems. Another method for increasing firewall performance is with new firewall designs. The architectures under investigation consist of multiple firewalls that collectively enforce a security policy. Our innovative distributed systems quickly divide traffic across different levels based on perceived threat, allowing traffic to be processed in parallel (beyond current firewall sandwich technology). Traffic deemed safe is transmitted to the secure network, while remaining traffic is forwarded to lower levels for further examination. The result of this divide-and-conquer strategy is lower delays for legitimate traffic, higher throughput
SCI based data acquisition architectures

International Nuclear Information System (INIS)

Bogaerts, J.A.C.; Divia, R.; Renardy, J.F.

1992-01-01

This paper discusses the Scalable Coherent Interface (SCI), an IEEE proposed standard (P1596) for interconnecting multiprocessor systems. The standard defines point to point connections between nodes, which can be processors, memories or I/O devices. Networks containing a maximum of 64K nodes with a bandwidth of one Gbyte/s between nodes, may be constructed. SCI is an attractive candidate to serve as a backbone for high speed, large volume data acquisition systems such as required by future experiments at the proposed Large Hadron Collider (LHC) at CERN. Work has started to simulate SCI based architectures for data acquisition systems. The simulation program proved to be a useful tool to study SCI systems. First results are reported on a model of a large LHC experiment containing over 1000 nodes
Business architecture for inter-organisational innovation networks: A case study comparison from South Africa and Germany

CSIR Research Space (South Africa)

Gous, H

2011-06-01

Full Text Available systems architectures. An important step towards a deeper understanding of inter-organisational innovation networks is to compare the business architectures of network case studies to identify similarities and differences in terms of scope and context...
Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors

Directory of Open Access Journals (Sweden)

Jongsoo Park

2013-01-01

Full Text Available Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar (SAR via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm of approximate strength reduction; data movement cost is economized by software locality optimizations facilitated by advanced architecture support; parallelism is fully harnessed in various patterns and granularities. We deliver over 35 billion backprojections per second throughput per compute node on an Intel® Xeon® processor E5-2670-based cluster, equipped with Intel® Xeon Phi™ coprocessors. This corresponds to processing a 3K×3K image within a second using a single node. Our study can be extended to other settings: backprojection is applicable elsewhere including medical imaging, approximate strength reduction is a general code transformation technique, and many-core processors are emerging as a solution to energy-efficient computing.
Using of new possibilities of Fermi architecture by development og GPGPU programs

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2013-01-01

Description of additional functions of hardware and software, which are presented in the structure of new architecture of FERMI graphic processors made by company NVIDIA, was given. Recommendations of their use within the realization of algorithms of scientific and technical calculations by means of the graphic processors were given. Application of the new possibilities of FERMI architecture and CUDA technologies (Compute Unified Device Architecture - unified hardware-software decision for parallel calculations on GPU) of NVIDIA Company was described. It was done for time reduction of applications' development which is using possibilities of GPGPU for acceleration of data processing
Interconnection network architectures based on integrated orbital angular momentum emitters

Science.gov (United States)

Scaffardi, Mirco; Zhang, Ning; Malik, Muhammad Nouman; Lazzeri, Emma; Klitis, Charalambos; Lavery, Martin; Sorel, Marc; Bogoni, Antonella

2018-02-01

Novel architectures for two-layer interconnection networks based on concentric OAM emitters are presented. A scalability analysis is done in terms of devices characteristics, power budget and optical signal to noise ratio by exploiting experimentally measured parameters. The analysis shows that by exploiting optical amplifications, the proposed interconnection networks can support a number of ports higher than 100. The OAM crosstalk induced-penalty, evaluated through an experimental characterization, do not significantly affect the interconnection network performance.
Launching applications on compute and service processors running under different operating systems in scalable network of processor boards with routers

Science.gov (United States)

Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

2009-03-17

A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
Convolutional networks for fast, energy-efficient neuromorphic computing.

Science.gov (United States)

Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S

2016-10-11

Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
Peer-to-peer Cooperative Scheduling Architecture for National Grid Infrastructure

Science.gov (United States)

Matyska, Ludek; Ruda, Miroslav; Toth, Simon

For some ten years, the Czech National Grid Infrastructure MetaCentrum uses a single central PBSPro installation to schedule jobs across the country. This centralized approach keeps a full track about all the clusters, providing support for jobs spanning several sites, implementation for the fair-share policy and better overall control of the grid environment. Despite a steady progress in the increased stability and resilience to intermittent very short network failures, growing number of sites and processors makes this architecture, with a single point of failure and scalability limits, obsolete. As a result, a new scheduling architecture is proposed, which relies on higher autonomy of clusters. It is based on a peer to peer network of semi-independent schedulers for each site or even cluster. Each scheduler accepts jobs for the whole infrastructure, cooperating with other schedulers on implementation of global policies like central job accounting, fair-share, or submission of jobs across several sites. The scheduling system is integrated with the Magrathea system to support scheduling of virtual clusters, including the setup of their internal network, again eventually spanning several sites. On the other hand, each scheduler is local to one of several clusters and is able to directly control and submit jobs to them even if the connection of other scheduling peers is lost. In parallel to the change of the overall architecture, the scheduling system itself is being replaced. Instead of PBSPro, chosen originally for its declared support of large scale distributed environment, the new scheduling architecture is based on the open-source Torque system. The implementation and support for the most desired properties in PBSPro and Torque are discussed and the necessary modifications to Torque to support the MetaCentrum scheduling architecture are presented, too.
Source-synchronous networks-on-chip circuit and architectural interconnect modeling

CERN Document Server

Mandal, Ayan; Mahapatra, Rabi

2014-01-01

This book describes novel methods for network-on-chip (NoC) design, using source-synchronous high-speed resonant clocks. The authors discuss NoCs from the bottom up, providing circuit level details, before providing architectural simulations. As a result, readers will get a complete picture of how a NoC can be designed and optimized. Using the methods described in this book, readers are enabled to design NoCs that are 5X better than existing approaches in terms of latency and throughput and can also sustain a significantly greater amount of traffic. • Describes novel methods for high-speed network-on-chip (NoC) design; • Enables readers to understand NoC design from both circuit and architectural levels; • Provides circuit-level details of the NoC (including clocking, router design), along with a high-speed, resonant clocking style which is used in the NoC; • Includes architectural simulations of the NoC, demonstrating significantly superior performance over the state-of-the-art.
An Asynchronous Time-Division-Multiplexed Network-on-Chip for Real-Time Systems

DEFF Research Database (Denmark)

Kasapaki, Evangelia

is an important part of the T-CREST paltform and used in a number of configurations. The flexible timing organization of Argo combines asynchronous routers with mesochronous NIs, which are connected to individually clocked cores, supporting a GALS system organization. The mesochronous NIs operate at the same......Multi-processor architectures using networks-on-chip (NOCs) for communication are becoming the standard approach in the development of embedded systems and general purpose platforms. Typically, multi-processor platforms follow a globally asynchronous locally synchronous (GALS) timing organization....... This thesis focuses on the design of Argo, a NOC targeted at hard real-time multi-processor platforms with a GALS timing organization. To support real-time communication, NOCs establish end-to-end connections and provide latency and throughput guarantees for these connections. Argo uses time division...
System Level Design of Reconfigurable Server Farms Using Elliptic Curve Cryptography Processor Engines

Directory of Open Access Journals (Sweden)

Sangook Moon

2014-01-01

Full Text Available As today’s hardware architecture becomes more and more complicated, it is getting harder to modify or improve the microarchitecture of a design in register transfer level (RTL. Consequently, traditional methods we have used to develop a design are not capable of coping with complex designs. In this paper, we suggest a way of designing complex digital logic circuits with a soft and advanced type of SystemVerilog at an electronic system level. We apply the concept of design-and-reuse with a high level of abstraction to implement elliptic curve crypto-processor server farms. With the concept of the superior level of abstraction to the RTL used with the traditional HDL design, we successfully achieved the soft implementation of the crypto-processor server farms as well as robust test bench code with trivial effort in the same simulation environment. Otherwise, it could have required error-prone Verilog simulations for the hardware IPs and other time-consuming jobs such as C/SystemC verification for the software, sacrificing more time and effort. In the design of the elliptic curve cryptography processor engine, we propose a 3X faster GF(2m serial multiplication architecture.
Effects of various event building techniques on data acquisition system architectures

International Nuclear Information System (INIS)

Barsotti, E.; Booth, A.; Bowden, M.

1990-04-01

The preliminary specifications for various new detectors throughout the world including those at the Superconducting Super Collider (SSC) already make it clear that existing event building techniques will be inadequate for the high trigger and data rates anticipated for these detectors. In the world of high-energy physics many approaches have been taken to solving the problem of reading out data from a whole detector and presenting a complete event to the physicist, while simultaneously keeping deadtime to a minimum. This paper includes a review of multiprocessor and telecommunications interconnection networks and how these networks relate to event building in general, illustrating advantages of the various approaches. It presents a more detailed study of recent research into new event building techniques which incorporate much greater parallelism to better accommodate high data rates. The future in areas such as front-end electronics architectures, high speed data links, event building and online processor arrays is also examined. Finally, details of a scalable parallel data acquisition system architecture being developed at Fermilab are given. 35 refs., 31 figs., 1 tab
Real-time tracking with a 3D-Flow processor array

International Nuclear Information System (INIS)

Crosetto, D.

1993-06-01

The problem of real-time track-finding has been performed to date with CAM (Content Addressable Memories) or with fast coincidence logic, because the processing scheme was thought to have much slower performance. Advances in technology together with a new architectural approach make it feasible to also explore the computing technique for real-time track finding thus giving the advantages of implementing algorithms that can find more parameters such as calculate the sagitta, curvature, pt, etc., with respect to the CAM approach. The report describes real-time track finding using new computing approach technique based on the 3D-Flow array processor system. This system consists of a fixed interconnection architecture scheme, allowing flexible algorithm implementation on a scalable platform. The 3D-Flow parallel processing system for track finding is scalable in size and performance by either increasing the number of processors, or increasing the speed or else the number of pipelined stages. The present article describes the conceptual idea and the design stage of the project
Modeling, analysis and optimization of network-on-chip communication architectures

CERN Document Server

Ogras, Umit Y

2013-01-01

Traditionally, design space exploration for Systems-on-Chip (SoCs) has focused on the computational aspects of the problem at hand. However, as the number of components on a single chip and their performance continue to increase, the communication architecture plays a major role in the area, performance and energy consumption of the overall system. As a result, a shift from computation-based to communication-based design becomes mandatory. Towards this end, network-on-chip (NoC) communication architectures have emerged recently as a promising alternative to classical bus and point-to-point communication architectures. This book explores outstanding research problems related to modeling, analysis and optimization of NoC communication architectures. More precisely, we present novel design methodologies, software tools and FPGA prototypes to aid the design of application-specific NoCs.

Digital design and computer architecture

CERN Document Server

Harris, David

2010-01-01

Digital Design and Computer Architecture is designed for courses that combine digital logic design with computer organization/architecture or that teach these subjects as a two-course sequence. Digital Design and Computer Architecture begins with a modern approach by rigorously covering the fundamentals of digital logic design and then introducing Hardware Description Languages (HDLs). Featuring examples of the two most widely-used HDLs, VHDL and Verilog, the first half of the text prepares the reader for what follows in the second: the design of a MIPS Processor. By the end of D
T-SDN architecture for space and ground integrated optical transport network

Science.gov (United States)

Nie, Kunkun; Hu, Wenjing; Gao, Shenghua; Chang, Chengwu

2015-11-01

Integrated optical transport network is the development trend of the future space information backbone network. The space and ground integrated optical transport network(SGIOTN) may contain a variety of equipment and systems. Changing the network or meeting some innovation missions in the network will be an expensive implement. Software Defined Network(SDN) provides a good solution to flexibly adding process logic, timely control states and resources of the whole network, as well as shielding the differences of heterogeneous equipment and so on. According to the characteristics of SGIOTN, we propose an transport SDN architecture for it, with hierarchical control plane and data plane composed of packet networks and optical transport networks.
Comparison of Three Smart Camera Architectures for Real-Time Machine Vision System

Directory of Open Access Journals (Sweden)

Abdul Waheed Malik

2013-12-01

Full Text Available This paper presents a machine vision system for real-time computation of distance and angle of a camera from a set of reference points located on a target board. Three different smart camera architectures were explored to compare performance parameters such as power consumption, frame speed and latency. Architecture 1 consists of hardware machine vision modules modeled at Register Transfer (RT level and a soft-core processor on a single FPGA chip. Architecture 2 is commercially available software based smart camera, Matrox Iris GT. Architecture 3 is a two-chip solution composed of hardware machine vision modules on FPGA and an external microcontroller. Results from a performance comparison show that Architecture 2 has higher latency and consumes much more power than Architecture 1 and 3. However, Architecture 2 benefits from an easy programming model. Smart camera system with FPGA and external microcontroller has lower latency and consumes less power as compared to single FPGA chip having hardware modules and soft-core processor.
Fluid and flexible minds: Intelligence reflects synchrony in the brain’s intrinsic network architecture

Directory of Open Access Journals (Sweden)

Michael A. Ferguson

2017-06-01

Full Text Available Human intelligence has been conceptualized as a complex system of dissociable cognitive processes, yet studies investigating the neural basis of intelligence have typically emphasized the contributions of discrete brain regions or, more recently, of specific networks of functionally connected regions. Here we take a broader, systems perspective in order to investigate whether intelligence is an emergent property of synchrony within the brain’s intrinsic network architecture. Using a large sample of resting-state fMRI and cognitive data (n = 830, we report that the synchrony of functional interactions within and across distributed brain networks reliably predicts fluid and flexible intellectual functioning. By adopting a whole-brain, systems-level approach, we were able to reliably predict individual differences in human intelligence by characterizing features of the brain’s intrinsic network architecture. These findings hold promise for the eventual development of neural markers to predict changes in intellectual function that are associated with neurodevelopment, normal aging, and brain disease. In our study, we aimed to understand how individual differences in intellectual functioning are reflected in the intrinsic network architecture of the human brain. We applied statistical methods, known as spectral decompositions, in order to identify individual differences in the synchronous patterns of spontaneous brain activity that reliably predict core aspects of human intelligence. The synchrony of brain activity at rest across multiple discrete neural networks demonstrated positive relationships with fluid intelligence. In contrast, global synchrony within the brain’s network architecture reliably, and inversely, predicted mental flexibility, a core facet of intellectual functioning. The multinetwork systems approach described here represents a methodological and conceptual extension of earlier efforts that related differences in
Parallel processor programs in the Federal Government

Science.gov (United States)

Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

1985-01-01

In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
Microlens array processor with programmable weight mask and direct optical input

Science.gov (United States)

Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen

1999-03-01

We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.
A survey of Tumult, a real-time multi-processor system

International Nuclear Information System (INIS)

Jansen, P.G.

1986-01-01

Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)
Optimal artificial neural network architecture selection for performance prediction of compact heat exchanger with the EBaLM-OTR technique

Energy Technology Data Exchange (ETDEWEB)

Wijayasekara, Dumidu, E-mail: wija2589@vandals.uidaho.edu [Department of Computer Science, University of Idaho, 1776 Science Center Drive, Idaho Falls, ID 83402 (United States); Manic, Milos [Department of Computer Science, University of Idaho, 1776 Science Center Drive, Idaho Falls, ID 83402 (United States); Sabharwall, Piyush [Idaho National Laboratory, Idaho Falls, ID (United States); Utgikar, Vivek [Department of Chemical Engineering, University of Idaho, Idaho Falls, ID 83402 (United States)

2011-07-15

Highlights: > Performance prediction of PCHE using artificial neural networks. > Evaluating artificial neural network performance for PCHE modeling. > Selection of over-training resilient artificial neural networks. > Artificial neural network architecture selection for modeling problems with small data sets. - Abstract: Artificial Neural Networks (ANN) have been used in the past to predict the performance of printed circuit heat exchangers (PCHE) with satisfactory accuracy. Typically published literature has focused on optimizing ANN using a training dataset to train the network and a testing dataset to evaluate it. Although this may produce outputs that agree with experimental results, there is a risk of over-training or over-learning the network rather than generalizing it, which should be the ultimate goal. An over-trained network is able to produce good results with the training dataset but fails when new datasets with subtle changes are introduced. In this paper we present EBaLM-OTR (error back propagation and Levenberg-Marquardt algorithms for over training resilience) technique, which is based on a previously discussed method of selecting neural network architecture that uses a separate validation set to evaluate different network architectures based on mean square error (MSE), and standard deviation of MSE. The method uses k-fold cross validation. Therefore in order to select the optimal architecture for the problem, the dataset is divided into three parts which are used to train, validate and test each network architecture. Then each architecture is evaluated according to their generalization capability and capability to conform to original data. The method proved to be a comprehensive tool in identifying the weaknesses and advantages of different network architectures. The method also highlighted the fact that the architecture with the lowest training error is not always the most generalized and therefore not the optimal. Using the method the testing
Optimal artificial neural network architecture selection for performance prediction of compact heat exchanger with the EBaLM-OTR technique

International Nuclear Information System (INIS)

Wijayasekara, Dumidu; Manic, Milos; Sabharwall, Piyush; Utgikar, Vivek

2011-01-01

Highlights: → Performance prediction of PCHE using artificial neural networks. → Evaluating artificial neural network performance for PCHE modeling. → Selection of over-training resilient artificial neural networks. → Artificial neural network architecture selection for modeling problems with small data sets. - Abstract: Artificial Neural Networks (ANN) have been used in the past to predict the performance of printed circuit heat exchangers (PCHE) with satisfactory accuracy. Typically published literature has focused on optimizing ANN using a training dataset to train the network and a testing dataset to evaluate it. Although this may produce outputs that agree with experimental results, there is a risk of over-training or over-learning the network rather than generalizing it, which should be the ultimate goal. An over-trained network is able to produce good results with the training dataset but fails when new datasets with subtle changes are introduced. In this paper we present EBaLM-OTR (error back propagation and Levenberg-Marquardt algorithms for over training resilience) technique, which is based on a previously discussed method of selecting neural network architecture that uses a separate validation set to evaluate different network architectures based on mean square error (MSE), and standard deviation of MSE. The method uses k-fold cross validation. Therefore in order to select the optimal architecture for the problem, the dataset is divided into three parts which are used to train, validate and test each network architecture. Then each architecture is evaluated according to their generalization capability and capability to conform to original data. The method proved to be a comprehensive tool in identifying the weaknesses and advantages of different network architectures. The method also highlighted the fact that the architecture with the lowest training error is not always the most generalized and therefore not the optimal. Using the method the
An eConsent-based System Architecture Supporting Cooperation in Integrated Healthcare Networks.

Science.gov (United States)

Bergmann, Joachim; Bott, Oliver J; Hoffmann, Ina; Pretschner, Dietrich P

2005-01-01

The economical need for efficient healthcare leads to cooperative shared care networks. A virtual electronic health record is required, which integrates patient related information but reflects the distributed infrastructure and restricts access only to those health professionals involved into the care process. Our work aims on specification and development of a system architecture fulfilling these requirements to be used in concrete regional pilot studies. Methodical analysis and specification have been performed in a healthcare network using the formal method and modelling tool MOSAIK-M. The complexity of the application field was reduced by focusing on the scenario of thyroid disease care, which still includes various interdisciplinary cooperation. Result is an architecture for a secure distributed electronic health record for integrated care networks, specified in terms of a MOSAIK-M-based system model. The architecture proposes business processes, application services, and a sophisticated security concept, providing a platform for distributed document-based, patient-centred, and secure cooperation. A corresponding system prototype has been developed for pilot studies, using advanced application server technologies. The architecture combines a consolidated patient-centred document management with a decentralized system structure without needs for replication management. An eConsent-based approach assures, that access to the distributed health record remains under control of the patient. The proposed architecture replaces message-based communication approaches, because it implements a virtual health record providing complete and current information. Acceptance of the new communication services depends on compatibility with the clinical routine. Unique and cross-institutional identification of a patient is also a challenge, but will loose significance with establishing common patient cards.
Cyber-Physical Architecture Assisted by Programmable Networking

OpenAIRE

Rubio-Hernan, Jose; Sahay, Rishikesh; De Cicco, Luca; Garcia-Alfaro, Joaquin

2018-01-01

Cyber-physical technologies are prone to attacks, in addition to faults and failures. The issue of protecting cyber-physical systems should be tackled by jointly addressing security at both cyber and physical domains, in order to promptly detect and mitigate cyber-physical threats. Towards this end, this letter proposes a new architecture combining control-theoretic solutions together with programmable networking techniques to jointly handle crucial threats to cyber-physical systems. The arch...
Improvements to Integrated Tradespace Analysis of Communications Architectures (ITACA) Network Loading Analysis Tool

Science.gov (United States)

Lee, Nathaniel; Welch, Bryan W.

2018-01-01

NASA's SCENIC project aims to simplify and reduce the cost of space mission planning by replicating the analysis capabilities of commercially licensed software which are integrated with relevant analysis parameters specific to SCaN assets and SCaN supported user missions. SCENIC differs from current tools that perform similar analyses in that it 1) does not require any licensing fees, 2) will provide an all-in-one package for various analysis capabilities that normally requires add-ons or multiple tools to complete. As part of SCENIC's capabilities, the ITACA network loading analysis tool will be responsible for assessing the loading on a given network architecture and generating a network service schedule. ITACA will allow users to evaluate the quality of service of a given network architecture and determine whether or not the architecture will satisfy the mission's requirements. ITACA is currently under development, and the following improvements were made during the fall of 2017: optimization of runtime, augmentation of network asset pre-service configuration time, augmentation of Brent's method of root finding, augmentation of network asset FOV restrictions, augmentation of mission lifetimes, and the integration of a SCaN link budget calculation tool. The improvements resulted in (a) 25% reduction in runtime, (b) more accurate contact window predictions when compared to STK(Registered Trademark) contact window predictions, and (c) increased fidelity through the use of specific SCaN asset parameters.
Satellite on-board real-time SAR processor prototype

Science.gov (United States)

Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

2017-11-01

A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and
A Novel Buffer Management Architecture for Epidemic Routing in Delay Tolerant Networks (DTNs)

KAUST Repository

Elwhishi, Ahmed; Ho, Pin-Han; Naik, K.; Shihada, Basem

2010-01-01

Delay tolerant networks (DTNs) are wireless networks in which an end-to-end path for a given node pair can never exist for an extended period. It has been reported as a viable approach in launching multiple message replicas in order to increase message delivery ratio and reduce message delivery delay. This advantage, nonetheless, is at the expense of taking more buffer space at each node. The combination of custody and replication entails high buffer and bandwidth overhead. This paper investigates a new buffer management architecture for epidemic routing in DTNs, which helps each node to make a decision on which message should be forwarded or dropped. The proposed buffer management architecture is characterized by a suite of novel functional modules, including Summary Vector Exchange Module (SVEM), Networks State Estimation Module (NSEM), and Utility Calculation Module (UCM). Extensive simulation results show that the proposed buffer management architecture can achieve superb performance against its counterparts in terms of delivery ratio and delivery delay.
A Novel Buffer Management Architecture for Epidemic Routing in Delay Tolerant Networks (DTNs)

KAUST Repository

Elwhishi, Ahmed

2010-11-17

Delay tolerant networks (DTNs) are wireless networks in which an end-to-end path for a given node pair can never exist for an extended period. It has been reported as a viable approach in launching multiple message replicas in order to increase message delivery ratio and reduce message delivery delay. This advantage, nonetheless, is at the expense of taking more buffer space at each node. The combination of custody and replication entails high buffer and bandwidth overhead. This paper investigates a new buffer management architecture for epidemic routing in DTNs, which helps each node to make a decision on which message should be forwarded or dropped. The proposed buffer management architecture is characterized by a suite of novel functional modules, including Summary Vector Exchange Module (SVEM), Networks State Estimation Module (NSEM), and Utility Calculation Module (UCM). Extensive simulation results show that the proposed buffer management architecture can achieve superb performance against its counterparts in terms of delivery ratio and delivery delay.
Development of the brain's functional network architecture.

Science.gov (United States)

Vogel, Alecia C; Power, Jonathan D; Petersen, Steven E; Schlaggar, Bradley L

2010-12-01

A full understanding of the development of the brain's functional network architecture requires not only an understanding of developmental changes in neural processing in individual brain regions but also an understanding of changes in inter-regional interactions. Resting state functional connectivity MRI (rs-fcMRI) is increasingly being used to study functional interactions between brain regions in both adults and children. We briefly review methods used to study functional interactions and networks with rs-fcMRI and how these methods have been used to define developmental changes in network functional connectivity. The developmental rs-fcMRI studies to date have found two general properties. First, regional interactions change from being predominately anatomically local in children to interactions spanning longer cortical distances in young adults. Second, this developmental change in functional connectivity occurs, in general, via mechanisms of segregation of local regions and integration of distant regions into disparate subnetworks.
Associative Memory Design for the Fast TracKer Processor (FTK)at ATLAS

CERN Document Server

Annovi, A; The ATLAS collaboration; Beretta, M; Bossini, E; Crescioli, F; Dell'Orso, M; Giannetti, P; Hoff, J; Liu, T; Liberali, V; Sacco, I; Schoening, A; Soltveit, H K; Stabile, A; Tripiccione, R

2011-01-01

We describe a VLSI processor for pattern recognition based on Content Addressable Memory (CAM) architecture, optimized for on-line track finding in high-energy physics experiments. A large CAM bank stores all trajectories of interest and extracts the ones compatible with a given event. This task is naturally parallelized by a CAM architecture able to output identified trajectories, recognized among a huge amount of possible combinations, in just a few 100 MHz clock cycles. We have developed this device (called the AMchip03 processor), using 180 nm technology, for the Silicon Vertex Trigger (SVT) upgrade at CDF [1] using a standard-cell VLSI design methodology. We propose a new design that introduces a full custom CAM cell and takes advantage of 65 nm technology. The customized design maximizes the pattern density, minimizes the power consumption and implements the functionalities needed for the planned Fast Tracker (FTK) [2], an ATLAS trigger upgrade project at LHC. We introduce a new variable resolution patt...
Space Network IP Services (SNIS): An Architecture for Supporting Low Earth Orbiting IP Satellite Missions

Science.gov (United States)

Israel, David J.

2005-01-01

The NASA Space Network (SN) supports a variety of missions using the Tracking and Data Relay Satellite System (TDRSS), which includes ground stations in White Sands, New Mexico and Guam. A Space Network IP Services (SNIS) architecture is being developed to support future users with requirements for end-to-end Internet Protocol (IP) communications. This architecture will support all IP protocols, including Mobile IP, over TDRSS Single Access, Multiple Access, and Demand Access Radio Frequency (RF) links. This paper will describe this architecture and how it can enable Low Earth Orbiting IP satellite missions.
Signalling design and architecture for a proposed mobile satellite network

Science.gov (United States)

Yan, T.-Y.; Cheng, U.; Wang, C.

1990-01-01

In a frequency-division/demand-assigned multiple-access (FD/DAMA) architecture, each mobile subscriber must make a connection request to the Network Management Center before transmission for either open-end or closed-end services. Open-end services are for voice calls and long file transfer and are processed on a blocked-call-cleared basis. Closed-end services are for transmitting burst data and are processed on a first-come first-served basis. This paper presents the signalling design and architecture for non-voice services of an FD/DAMA mobile satellite network. The connection requests are made through the recently proposed multiple channel collision resolution scheme which provides a significantly higher throughput than the traditional slotted ALOHA scheme. For non-voice services, it is well known that retransmissions are necessary to ensure the delivery of a message in its entirety from the source to destination. Retransmission protocols for open-end and closed-end data transfer are investigated. The signal structure for the proposed network is derived from X-25 standards with appropriate modifications. The packet types and their usages are described in this paper.
Demonstration of two-qubit algorithms with a superconducting quantum processor.

Science.gov (United States)

DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J

2009-07-09

Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.

Researching, building a soft-processor and Ethernet interface circuit using EDK

International Nuclear Information System (INIS)

Tuong Thi Thu Huong; Pham Ngoc Tuan; Truong Van Dat, Dang Lanh; Chau Thi Nhu Quynh

2014-01-01

The processor is an indispensable component in the measurement and automatic control systems. This report describes the fabrication of a soft-processor (32-bits, on-chip block RAM 64K, 50M clock, internal and peripheral bus) for receiving, sending and processing of data Ethernet packets. This processor is fabricated using the XPS component from EDK (Xilinx) software toolkit. After that, it is configured on the FPGA named Spartan XC3S500E circuit. A firmware of a processor for controlling the interface between processor and Ethernet port is written in C language and can play a role of a HOST (station) which has its own IP to connect to Ethernet network. Besides, there are some needed parts as follows: an Ethernet interfacing controller chip, a suitable cable providing a speed up to 100 Mbs and an application program running under Window XP environment written in LabView to communicate with soft-processor. (author)
SNMS: an intelligent transportation system network architecture based on WSN and P2P network

Institute of Scientific and Technical Information of China (English)

LI Li; LIU Yuan-an; TANG Bi-hua

2007-01-01

With the development of city road networks, the question of how to obtain information about the roads is becoming more and more important. In this article, sensor network with mobile station (SNMS), a novel two-tiered intelligent transportation system (ITS) network architecture based on wireless sensor network (WSN) and peer-to-peer (P2P) network, is proposed to provide significant traffic information about the road and thereby, assist travelers to take optimum decisions when they are driving. A detailed explanation with regard to the strategy of each level as well as the design of two main components in the network, sensor unit (SU) and mobile station (MS), is presented. Finally, a representative scenario is described to display the operation of the system.
USC orthogonal multiprocessor for image processing with neural networks

Science.gov (United States)

Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid

1990-07-01

This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.
Initial validation of ATLAS software on the ARM architecture

Energy Technology Data Exchange (ETDEWEB)

Kawamura, Gen; Quadt, Arnulf; Smith, Joshua Wyatt [II. Physikalisches Institut, Georg-August Universitaet Goettingen (Germany); Seuster, Rolf [TRIUMF (Canada); Stewart, Graeme [University of Glasgow (United Kingdom)

2016-07-01

In the early 2000's the introduction of the multi-core era of computing helped industry and experiments such as ATLAS realize even more computing power. This was necessary as the limits of what a single-core processor could do where quickly being reached. Our current model of computing is to increase the number of multi-core nodes in a server farm in order to handle the increased influx of data. As power costs and our need for more computing power increase, this model will eventually become non-realistic. Once again a paradigm shift has to take place. One such option is to look at alternative architectures for large scale server farms. ARM processors are such an example. Making up approximately 95 % of the smartphone and tablet market these processors are widely available, very power conservative and constantly becoming faster. The ATLAS software code base (Athena) is extremely complex comprising of more than 6.5 million lines of code. It has very recently been ported to the ARM 64-bit architecture. The process of our port as well as the first validation plots are presented and compared to the traditional x86 architecture.
Ring-array processor distribution topology for optical interconnects

Science.gov (United States)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
Data center networks topologies, architectures and fault-tolerance characteristics

CERN Document Server

Liu, Yang; Veeraraghavan, Malathi; Lin, Dong; Hamdi, Mounir

2013-01-01

This SpringerBrief presents a survey of data center network designs and topologies and compares several properties in order to highlight their advantages and disadvantages. The brief also explores several routing protocols designed for these topologies and compares the basic algorithms to establish connections, the techniques used to gain better performance, and the mechanisms for fault-tolerance. Readers will be equipped to understand how current research on data center networks enables the design of future architectures that can improve performance and dependability of data centers. This con
Mesh Network Architecture for Enabling Inter-Spacecraft Communication

Science.gov (United States)

Becker, Christopher; Merrill, Garrick

2017-01-01

To enable communication between spacecraft operating in a formation or small constellation, a mesh network architecture was developed and tested using a time division multiple access (TDMA) communication scheme. The network is designed to allow for the exchange of telemetry and other data between spacecraft to enable collaboration between small spacecraft. The system uses a peer-to-peer topology with no central router, so that it does not have a single point of failure. The mesh network is dynamically configurable to allow for addition and subtraction of new spacecraft into the communication network. Flight testing was performed using an unmanned aerial system (UAS) formation acting as a spacecraft analogue and providing a stressing environment to prove mesh network performance. The mesh network was primarily devised to provide low latency, high frequency communication but is flexible and can also be configured to provide higher bandwidth for applications desiring high data throughput. The network includes a relay functionality that extends the maximum range between spacecraft in the network by relaying data from node to node. The mesh network control is implemented completely in software making it hardware agnostic, thereby allowing it to function with a wide variety of existing radios and computing platforms..
A modular control architecture for real-time synchronous and asynchronous systems

International Nuclear Information System (INIS)

Butler, P.L.; Jones, J.P.

1993-01-01

This paper describes a control architecture for real-time control of complex robotic systems. The Modular Integrated Control Architecture (MICA), which is actually two complementary control systems, recognizes and exploits the differences between asynchronous and synchronous control. The asynchronous control system simulates shared memory on a heterogeneous network. For control information, a portable event-scheme is used. This scheme provides consistent interprocess coordination among multiple tasks on a number of distributed systems. The machines in the network can vary with respect to their native operating systems and the intemal representation of numbers they use. The synchronous control system is needed for tight real-time control of complex electromechanical systems such as robot manipulators, and the system uses multiple processors at a specified rate. Both the synchronous and asynchronous portions of MICA have been developed to be extremely modular. MICA presents a simple programming model to code developers and also considers the needs of system integrators and maintainers. MICA has been used successfully in a complex robotics project involving a mobile 7-degree-of-freedom manipulator in a heterogeneous network with a body of software totaling over 100,000 lines of code. MICA has also been used in another robotics system, controlling a commercial long-reach manipulator
Reversible machine code and its abstract processor architecture

DEFF Research Database (Denmark)

Axelsen, Holger Bock; Glück, Robert; Yokoyama, Tetsuo

2007-01-01

A reversible abstract machine architecture and its reversible machine code are presented and formalized. For machine code to be reversible, both the underlying control logic and each instruction must be reversible. A general class of machine instruction sets was proven to be reversible, building...
mCRAN: A radio access network architecture for 5G indoor ccommunications

NARCIS (Netherlands)

Chandra, Kishor; Cao, Zizheng; Bruintjes, Tom; Prasad, R.V.; Karagiannis, Georgios; Tangdiongga, E.; van den Boom, H.P.A.; Kokkeler, Andre B.J.

2015-01-01

Millimeter wave (mmWave) communication is being seen as a disruptive technology for 5G era. In particular, 60GHz frequency band has emerged as a promising candidate for multi-Gbps connectivity in indoor and hotspot areas. In terms of network architecture, cloud radio access network (CRAN) has
mCRAN : a radio access network architecture for 5G indoor communications

NARCIS (Netherlands)

Chandra, Kishor; Cao, Zizheng; Bruintjes, T. M.; Prasad, R. Venkatesha; Karagiannis, G.; Tangdiongga, Eduward; van den Boom, H.P.A.; Kokkeler, A. B J

2015-01-01

Millimeter wave (mmWave) communication is being seen as a disruptive technology for 5G era. In particular, 60GHz frequency band has emerged as a promising candidate for multi-Gbps connectivity in indoor and hotspot areas. In terms of network architecture, cloud radio access network (CRAN) has
A general model of concurrency and its implementation as many-core dynamic RISC processors

NARCIS (Netherlands)

Bernard, T.; Bousias, K.; Guang, L.; Jesshope, C.R.; Lankamp, M.; van Tol, M.W.; Zhang, L.

2008-01-01

This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing
HTGR core seismic analysis using an array processor

International Nuclear Information System (INIS)

Shatoff, H.; Charman, C.M.

1983-01-01

A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion
Evaluation of an IP Fabric network architecture for CERN's data center

CERN Document Server

AUTHOR|(CDS)2156318; Barceló Ordinas, José M.

CERN has a large-scale data center with over 11500 servers used to analyze massive amounts of data acquired from the physics experiments and to provide IT services to workers. Its current network architecture is based on the classic three-tier design and it uses both IPv4 and IPv6. Between the access and aggregation layers the traffic is switched in Layer 2, while between aggregation and core it is routed using dual-stack OSPF. A new architecture is needed to increase redundancy and to provide virtual machine mobility and traffic isolation. The state-of-the-art architecture IP Fabric with EVPN is evaluated as a possible solution. The evaluation comprises a study of different features and options, including BGP table scalability and autonomous system number distributions. The proposed solution contains eBGP as the routing protocol, a route control policy, fast convergence mechanisms and an EVPN overlay with iBGP routing and VXLAN encapsulation. The solution is tested in the lab with the network equipment curre...
Resting State Networks' Corticotopy: The Dual Intertwined Rings Architecture

Science.gov (United States)

Mesmoudi, Salma; Perlbarg, Vincent; Rudrauf, David; Messe, Arnaud; Pinsard, Basile; Hasboun, Dominique; Cioli, Claudia; Marrelec, Guillaume; Toro, Roberto; Benali, Habib; Burnod, Yves

2013-01-01

How does the brain integrate multiple sources of information to support normal sensorimotor and cognitive functions? To investigate this question we present an overall brain architecture (called “the dual intertwined rings architecture”) that relates the functional specialization of cortical networks to their spatial distribution over the cerebral cortex (or “corticotopy”). Recent results suggest that the resting state networks (RSNs) are organized into two large families: 1) a sensorimotor family that includes visual, somatic, and auditory areas and 2) a large association family that comprises parietal, temporal, and frontal regions and also includes the default mode network. We used two large databases of resting state fMRI data, from which we extracted 32 robust RSNs. We estimated: (1) the RSN functional roles by using a projection of the results on task based networks (TBNs) as referenced in large databases of fMRI activation studies; and (2) relationship of the RSNs with the Brodmann Areas. In both classifications, the 32 RSNs are organized into a remarkable architecture of two intertwined rings per hemisphere and so four rings linked by homotopic connections. The first ring forms a continuous ensemble and includes visual, somatic, and auditory cortices, with interspersed bimodal cortices (auditory-visual, visual-somatic and auditory-somatic, abbreviated as VSA ring). The second ring integrates distant parietal, temporal and frontal regions (PTF ring) through a network of association fiber tracts which closes the ring anatomically and ensures a functional continuity within the ring. The PTF ring relates association cortices specialized in attention, language and working memory, to the networks involved in motivation and biological regulation and rhythms. This “dual intertwined architecture” suggests a dual integrative process: the VSA ring performs fast real-time multimodal integration of sensorimotor information whereas the PTF ring performs multi
Synchronization of faulty processors in coarse-grained TMR protected partially reconfigurable FPGA designs

International Nuclear Information System (INIS)

Kretzschmar, U.; Gomez-Cornejo, J.; Astarloa, A.; Bidarte, U.; Ser, J. Del

2016-01-01

The expansion of FPGA technology in numerous application fields is a fact. Single Event Effects (SEE) are a critical factor for the reliability of FPGA based systems. For this reason, a number of researches have been studying fault tolerance techniques to harden different elements of FPGA designs. Using Partial Reconfiguration (PR) in conjunction with Triple Modular Redundancy (TMR) is an emerging approach in recent publications dealing with the implementation of fault tolerant processors on SRAM-based FPGAs. While these works pay great attention to the repair of erroneous instances by means of reconfiguration, the essential step of synchronizing the repaired processors is insufficiently addressed. In this context, this paper poses four different synchronization approaches for soft core processors, which balance differently the trade-off between synchronization speed and hardware overhead. All approaches are assessed in practice by synchronizing TMR protected PicoBlaze processors implemented on a Virtex-5 FPGA. Nevertheless all methods are of a general nature and can be applied for different processor architectures in a straightforward fashion. - Highlights: • Four different synchronization methods for faulty processors are proposed. • The methods balance between synchronization speed and hardware overhead. • They can be applied to TMR-protected reconfigurable FPGA designs. • The proposed schemes are implemented and tested in real hardware.
Cooperative Technique Based on Sensor Selection in Wireless Sensor Network

Directory of Open Access Journals (Sweden)

ISLAM, M. R.

2009-02-01

Full Text Available An energy efficient cooperative technique is proposed for the IEEE 1451 based Wireless Sensor Networks. Selected numbers of Wireless Transducer Interface Modules (WTIMs are used to form a Multiple Input Single Output (MISO structure wirelessly connected with a Network Capable Application Processor (NCAP. Energy efficiency and delay of the proposed architecture are derived for different combination of cluster size and selected number of WTIMs. Optimized constellation parameters are used for evaluating derived parameters. The results show that the selected MISO structure outperforms the unselected MISO structure and it shows energy efficient performance than SISO structure after a certain distance.
System on chip module configured for event-driven architecture

Science.gov (United States)

Robbins, Kevin; Brady, Charles E.; Ashlock, Tad A.

2017-10-17

A system on chip (SoC) module is described herein, wherein the SoC modules comprise a processor subsystem and a hardware logic subsystem. The processor subsystem and hardware logic subsystem are in communication with one another, and transmit event messages between one another. The processor subsystem executes software actors, while the hardware logic subsystem includes hardware actors, the software actors and hardware actors conform to an event-driven architecture, such that the software actors receive and generate event messages and the hardware actors receive and generate event messages.
An In-Home Digital Network Architecture for Real-Time and Non-Real-Time Communication

NARCIS (Netherlands)

Scholten, Johan; Jansen, P.G.; Hanssen, F.T.Y.; Hattink, Tjalling

2002-01-01

This paper describes an in-home digital network architecture that supports both real-time and non-real-time communication. The architecture deploys a distributed token mechanism to schedule communication streams and to offer guaranteed quality-ofservice. Essentially, the token mechanism prevents
SAD PROCESSOR FOR MULTIPLE MACROBLOCK MATCHING IN FAST SEARCH VIDEO MOTION ESTIMATION

Directory of Open Access Journals (Sweden)

Nehal N. Shah

2015-02-01

Full Text Available Motion estimation is a very important but computationally complex task in video coding. Process of determining motion vectors based on the temporal correlation of consecutive frame is used for video compression. In order to reduce the computational complexity of motion estimation and maintain the quality of encoding during motion compensation, different fast search techniques are available. These block based motion estimation algorithms use the sum of absolute difference (SAD between corresponding macroblock in current frame and all the candidate macroblocks in the reference frame to identify best match. Existing implementations can perform SAD between two blocks using sequential or pipeline approach but performing multi operand SAD in single clock cycle with optimized recourses is state of art. In this paper various parallel architectures for computation of the fixed block size SAD is evaluated and fast parallel SAD architecture is proposed with optimized resources. Further SAD processor is described with 9 processing elements which can be configured for any existing fast search block matching algorithm. Proposed SAD processor consumes 7% fewer adders compared to existing implementation for one processing elements. Using nine PE it can process 84 HD frames per second in worse case which is good outcome for real time implementation. In average case architecture process 325 HD frames per second.

Onboard Data Processors for Planetary Ice-Penetrating Sounding Radars

Science.gov (United States)

Tan, I. L.; Friesenhahn, R.; Gim, Y.; Wu, X.; Jordan, R.; Wang, C.; Clark, D.; Le, M.; Hand, K. P.; Plaut, J. J.

2011-12-01

Among the many concerns faced by outer planetary missions, science data storage and transmission hold special significance. Such missions must contend with limited onboard storage, brief data downlink windows, and low downlink bandwidths. A potential solution to these issues lies in employing onboard data processors (OBPs) to convert raw data into products that are smaller and closely capture relevant scientific phenomena. In this paper, we present the implementation of two OBP architectures for ice-penetrating sounding radars tasked with exploring Europa and Ganymede. Our first architecture utilizes an unfocused processing algorithm extended from the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS, Jordan et. al. 2009). Compared to downlinking raw data, we are able to reduce data volume by approximately 100 times through OBP usage. To ensure the viability of our approach, we have implemented, simulated, and synthesized this architecture using both VHDL and Matlab models (with fixed-point and floating-point arithmetic) in conjunction with Modelsim. Creation of a VHDL model of our processor is the principle step in transitioning to actual digital hardware, whether in a FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), and successful simulation and synthesis strongly indicate feasibility. In addition, we examined the tradeoffs faced in the OBP between fixed-point accuracy, resource consumption, and data product fidelity. Our second architecture is based upon a focused fast back projection (FBP) algorithm that requires a modest amount of computing power and on-board memory while yielding high along-track resolution and improved slope detection capability. We present an overview of the algorithm and details of our implementation, also in VHDL. With the appropriate tradeoffs, the use of OBPs can significantly reduce data downlink requirements without sacrificing data product fidelity. Through the development
QoS Management and Control for an All-IP WiMAX Network Architecture: Design, Implementation and Evaluation

Directory of Open Access Journals (Sweden)

Thomas Michael Bohnert

2008-01-01

Full Text Available The IEEE 802.16 standard provides a specification for a fixed and mobile broadband wireless access system, offering high data rate transmission of multimedia services with different Quality-of-Service (QoS requirements through the air interface. The WiMAX Forum, going beyond the air interface, defined an end-to-end WiMAX network architecture, based on an all-IP platform in order to complete the standards required for a commercial rollout of WiMAX as broadband wireless access solution. As the WiMAX network architecture is only a functional specification, this paper focuses on an innovative solution for an end-to-end WiMAX network architecture offering in compliance with the WiMAX Forum specification. To our best knowledge, this is the first WiMAX architecture built by a research consortium globally and was performed within the framework of the European IST project WEIRD (WiMAX Extension to Isolated Research Data networks. One of the principal features of our architecture is support for end-to-end QoS achieved by the integration of resource control in the WiMAX wireless link and the resource management in the wired domains in the network core. In this paper we present the architectural design of these QoS features in the overall WiMAX all-IP framework and their functional as well as performance evaluation. The presented results can safely be considered as unique and timely for any WiMAX system integrator.
System-level Modeling of Wireless Integrated Sensor Networks

DEFF Research Database (Denmark)

Virk, Kashif M.; Hansen, Knud; Madsen, Jan

2005-01-01

Wireless integrated sensor networks have emerged as a promising infrastructure for a new generation of monitoring and tracking applications. In order to efficiently utilize the extremely limited resources of wireless sensor nodes, accurate modeling of the key aspects of wireless sensor networks...... is necessary so that system-level design decisions can be made about the hardware and the software (applications and real-time operating system) architecture of sensor nodes. In this paper, we present a SystemC-based abstract modeling framework that enables system-level modeling of sensor network behavior...... by modeling the applications, real-time operating system, sensors, processor, and radio transceiver at the sensor node level and environmental phenomena, including radio signal propagation, at the sensor network level. We demonstrate the potential of our modeling framework by simulating and analyzing a small...
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

International Nuclear Information System (INIS)

Chow, James C L; Lam, Phil; Jaffray, David A

2012-01-01

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR G ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

Science.gov (United States)

Chow, James C. L.; Lam, Phil; Jaffray, David A.

2012-02-01

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR_GET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

Energy Technology Data Exchange (ETDEWEB)

Chow, James C L; Lam, Phil; Jaffray, David A, E-mail: james.chow@rmp.uhn.on.ca [Department of Radiation Oncology, University of Toronto and Radiation Medicine Program, Princess Margaret Hospital, University Health Network, Toronto, Ontario M5G 2M9 (Canada)

2012-02-09

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR{sub G}ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Architecture of security management unit for safe hosting of multiple agents

Science.gov (United States)

Gilmont, Tanguy; Legat, Jean-Didier; Quisquater, Jean-Jacques

1999-04-01

In such growing areas as remote applications in large public networks, electronic commerce, digital signature, intellectual property and copyright protection, and even operating system extensibility, the hardware security level offered by existing processors is insufficient. They lack protection mechanisms that prevent the user from tampering critical data owned by those applications. Some devices make exception, but have not enough processing power nor enough memory to stand up to such applications (e.g. smart cards). This paper proposes an architecture of secure processor, in which the classical memory management unit is extended into a new security management unit. It allows ciphered code execution and ciphered data processing. An internal permanent memory can store cipher keys and critical data for several client agents simultaneously. The ordinary supervisor privilege scheme is replaced by a privilege inheritance mechanism that is more suited to operating system extensibility. The result is a secure processor that has hardware support for extensible multitask operating systems, and can be used for both general applications and critical applications needing strong protection. The security management unit and the internal permanent memory can be added to an existing CPU core without loss of performance, and do not require it to be modified.
A soft-core processor architecture optimised for radar signal processing applications

CSIR Research Space (South Africa)

Broich, R

2013-12-01

Full Text Available -performance soft-core processing architecture is proposed. To develop such a processing architecture, data and signal-flow characteristics of common radar signal processing algorithms are analysed. Each algorithm is broken down into signal processing...
FASTBUS Standard Routines implementation for Fermilab embedded processor boards

International Nuclear Information System (INIS)

Pangburn, J.; Patrick, J.; Kent, S.; Oleynik, G.; Pordes, R.; Votava, M.; Heyes, G.; Watson, W.A. III

1992-10-01

In collaboration with CEBAF, Fermilab's Online Support Department and the CDF experiment have produced a new implementation of the IEEE FASTBUS Standard Routines for two embedded processor FASTBUS boards: the Fermilab Smart Crate Controller (FSCC) and the FASTBUS Readout Controller (FRC). Features of this implementation include: portability (to other embedded processor boards), remote source-level debugging, high speed, optional generation of very high-speed code for readout applications, and built-in Sun RPC support for execution of FASTBUS transactions and lists over the network
The hardware implementation of the CERN SPS ultrafast feedback processor demonstrator

CERN Document Server

Dusakto, J E; Fox, J D; Olsen, J; Rivetta, C H; Höfle, W

2013-01-01

An ultrafast 4GSa/s transverse feedback processor has been developed for proof-of-concept studies of feedback control of e-cloud driven and transverse mode coupled intra-bunch instabilities in the CERN SPS. This system consists of a high-speed ADC on the front end and equally fast DAC on the back end. All control and signal processing is implemented in FPGA logic. This system is capable of taking up to 16 sample slices across a single SPS bunch and processing each slice individually within a reconfigurable signal processor. This demonstrator system is a rapidly developed prototype, consisting of both commercial and custom-design components. It can stabilize the motion of a single particle bunch using closed loop feedback. The system can also run open loop as a high-speed arbitrary waveform generator and contains diagnostic features including a special ADC snapshot capture memory. This paper describes the overall system, the feedback processor and focuses on the hardware architecture, design ...
Parallel processing data network of master and slave transputers controlled by a serial control network

Science.gov (United States)

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

Science.gov (United States)

Gilliland, M. C.; Smith, B. J.; Calvert, W.

1976-01-01

The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.
Nested dissection on a mesh-connected processor array

International Nuclear Information System (INIS)

Worley, P.H.; Schreiber, R.

1986-01-01

The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9
Predicting Electrocardiogram and Arterial Blood Pressure Waveforms with Different Echo State Network Architectures

Science.gov (United States)

2014-11-01

Predicting Electrocardiogram and Arterial Blood Pressure Waveforms with Different Echo State Network Architectures Allan Fong, MS1,3, Ranjeev...the medical staff in Intensive Care Units. The ability to predict electrocardiogram and arterial blood pressure waveforms can potentially help the...type of neural network for mining, understanding, and predicting electrocardiogram and arterial blood pressure waveforms. Several network
DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency

Directory of Open Access Journals (Sweden)

David Raphaël

2008-01-01

Full Text Available Abstract Flexibility becomes a major concern for the development of multimedia and mobile communication systems, as well as classical high-performance and low-energy consumption constraints. The use of general-purpose processors solves flexibility problems but fails to cope with the increasing demand for energy efficiency. This paper presents the DART architecture based on the functional-level reconfiguration paradigm which allows a significant improvement in energy efficiency. DART is built around a hierarchical interconnection network allowing high flexibility while keeping the power overhead low. To enable specific optimizations, DART supports two modes of reconfiguration. The compilation framework is built using compilation and high-level synthesis techniques. A 3G mobile communication application has been implemented as a proof of concept. The energy distribution within the architecture and the physical implementation are also discussed. Finally, the VLSI design of a 0.13 m CMOS SoC implementing a specialized DART cluster is presented.
DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency

Directory of Open Access Journals (Sweden)

Sébastien Pillement

2007-12-01

Full Text Available Flexibility becomes a major concern for the development of multimedia and mobile communication systems, as well as classical high-performance and low-energy consumption constraints. The use of general-purpose processors solves flexibility problems but fails to cope with the increasing demand for energy efficiency. This paper presents the DART architecture based on the functional-level reconfiguration paradigm which allows a significant improvement in energy efficiency. DART is built around a hierarchical interconnection network allowing high flexibility while keeping the power overhead low. To enable specific optimizations, DART supports two modes of reconfiguration. The compilation framework is built using compilation and high-level synthesis techniques. A 3G mobile communication application has been implemented as a proof of concept. The energy distribution within the architecture and the physical implementation are also discussed. Finally, the VLSI design of a 0.13Ã¢Â€Â‰ÃŽÂ¼m CMOS SoC implementing a specialized DART cluster is presented.
Convolutional networks for fast, energy-efficient neuromorphic computing

Science.gov (United States)

Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.

2016-01-01

Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489
Research on the Architecture of a Basic Reconfigurable Information Communication Network

Directory of Open Access Journals (Sweden)

Ruimin Wang

2013-01-01

Full Text Available The current information network cannot fundamentally meet some urgent requirements, such as providing ubiquitous information services and various types of heterogeneous network, supporting diverse and comprehensive network services, possessing high quality communication effects, ensuring the security and credibility of information interaction, and implementing effective supervisory control. This paper provides the theory system for the basic reconfigurable information communication network based on the analysis of present problems on the Internet and summarizes the root of these problems. It also provides an in-depth discussion about the related technologies and the prime components of the architecture.
A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

OpenAIRE

Zhang, Xinyu; Das, Srinjoy; Neopane, Ojash; Kreutz-Delgado, Ken

2017-01-01

In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been proposed for convolutional neural networks (CNNs) that enable high performance for classification tasks at lower power than CPU and GPU processors. However, to date, there has been little research on the use of FPGA implementations of deconvolutional neural...
Parallel protein secondary structure prediction based on neural networks.

Science.gov (United States)

Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

2004-01-01

Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

Thinking in networks: artistic–architectural responses to ubiquitous information

Directory of Open Access Journals (Sweden)

Yvonne Spielmann

2011-12-01

Full Text Available The article discusses creative practices that in aesthetical-technical ways intervene into the computer networked communication systems.I am interested in artist practices that use networks in different ways to make us aware about the possibilities to rethink media-cultural environments. I use the example of the Japanese art-architectural group Double Negative Architecture to give an example of creatively thinking in networks.Yvonne Spielmann (Ph.D., Dr. habil. is presently Research Professor and Chair of New Media at The University of the West of Scotland. Her work focuses on inter-relationships between media and culture, technology, art, science and communication, and in particular on Western/European and non-Western/South-East Asian interaction. Milestones of publish research output are four authored monographs and about 90 single authored articles. Her book, “Video, the Reflexive Medium” (published by MIT Press 2008, Japanese edition by Sangen-sha Press 2011 was rewarded the 2009 Lewis Mumford Award for Outstanding Scholarship in the Ecology of Technics. Her most recent book “Hybrid Cultures” was published in German by Suhrkamp Press in 2010, English edition from MIT Press in 2012. Spielmann's work has been published in German and English and has been translated into French, Polish, Croatian, Swedish, Japanese, and Korean. She holds the 2011 Swedish Prize for Swedish–German scientific co-operation.
Shifts in the architecture of the Nationwide Health Information Network.

Science.gov (United States)

Lenert, Leslie; Sundwall, David; Lenert, Michael Edward

2012-01-01

In the midst of a US $30 billion USD investment in the Nationwide Health Information Network (NwHIN) and electronic health records systems, a significant change in the architecture of the NwHIN is taking place. Prior to 2010, the focus of information exchange in the NwHIN was the Regional Health Information Organization (RHIO). Since 2010, the Office of the National Coordinator (ONC) has been sponsoring policies that promote an internet-like architecture that encourages point to-point information exchange and private health information exchange networks. The net effect of these activities is to undercut the limited business model for RHIOs, decreasing the likelihood of their success, while making the NwHIN dependent on nascent technologies for community level functions such as record locator services. These changes may impact the health of patients and communities. Independent, scientifically focused debate is needed on the wisdom of ONC's proposed changes in its strategy for the NwHIN.
Extraction of fibre network architecture by X-ray tomography and prediction of elastic properties using an affine analytical model

International Nuclear Information System (INIS)

Tsarouchas, D.; Markaki, A.E.

2011-01-01

This paper proposes a method for extracting reliable architectural characteristics from complex porous structures using micro-computed tomography (μCT) images. The work focuses on a highly porous material composed of a network of fibres bonded together. The segmentation process, allowing separation of the fibres from the remainder of the image, is the most critical step in constructing an accurate representation of the network architecture. Segmentation methods, based on local and global thresholding, were investigated and evaluated by a quantitative comparison of the architectural parameters they yielded, such as the fibre orientation and segment length (sections between joints) distributions and the number of inter-fibre crossings. To improve segmentation accuracy, a deconvolution algorithm was proposed to restore the original images. The efficacy of the proposed method was verified by comparing μCT network architectural characteristics with those obtained using high resolution CT scans (nanoCT). The results indicate that this approach resolves the architecture of these complex networks and produces results approaching the quality of nanoCT scans. The extracted architectural parameters were used in conjunction with an affine analytical model to predict the axial and transverse stiffnesses of the fibre network. Transverse stiffness predictions were compared with experimentally measured values obtained by vibration testing.
JIST: Just-In-Time Scheduling Translation for Parallel Processors

Directory of Open Access Journals (Sweden)

Giovanni Agosta

2005-01-01

Full Text Available The application fields of bytecode virtual machines and VLIW processors overlap in the area of embedded and mobile systems, where the two technologies offer different benefits, namely high code portability, low power consumption and reduced hardware cost. Dynamic compilation makes it possible to bridge the gap between the two technologies, but special attention must be paid to software instruction scheduling, a must for the VLIW architectures. We have implemented JIST, a Virtual Machine and JIT compiler for Java Bytecode targeted to a VLIW processor. We show the impact of various optimizations on the performance of code compiled with JIST through the experimental study on a set of benchmark programs. We report significant speedups, and increments in the number of instructions issued per cycle up to 50% with respect to the non-scheduling version of the JITcompiler. Further optimizations are discussed.
A SECURE MESSAGE TRANSMISSION SYSTEM ARCHITECTURE FOR COMPUTER NETWORKS EMPLOYING SMART CARDS

Directory of Open Access Journals (Sweden)

Geylani KARDAŞ

2008-01-01

Full Text Available In this study, we introduce a mobile system architecture which employs smart cards for secure message transmission in computer networks. The use of smart card provides two security services as authentication and confidentiality in our design. The security of the system is provided by asymmetric encryption. Hence, smart cards are used to store personal account information as well as private key of each user for encryption / decryption operations. This offers further security, authentication and mobility to the system architecture. A real implementation of the proposed architecture which utilizes the JavaCard technology is also discussed in this study.
Optical home network based on an N×N multimode fiber architecture and CWDM technology

NARCIS (Netherlands)

Richard, F.; Guignard, P.; Pizzinat, A.; Guillo, L.; Guillory, J.; Charbonnier, B; Koonen, A.M.J.; Martinez, E.O.; Tanguy, E.; Li, H.W.

2011-01-01

With this optical home network solution associating an N×N multimode architecture and CWDM technology, various applications and network topologies are supported by a unique multiformat infrastructure. Issues related to the use of MMF are discussed.
Reference Architecture for Multi-Layer Software Defined Optical Data Center Networks

Directory of Open Access Journals (Sweden)

Casimer DeCusatis

2015-09-01

Full Text Available As cloud computing data centers grow larger and networking devices proliferate; many complex issues arise in the network management architecture. We propose a framework for multi-layer; multi-vendor optical network management using open standards-based software defined networking (SDN. Experimental results are demonstrated in a test bed consisting of three data centers interconnected by a 125 km metropolitan area network; running OpenStack with KVM and VMW are components. Use cases include inter-data center connectivity via a packet-optical metropolitan area network; intra-data center connectivity using an optical mesh network; and SDN coordination of networking equipment within and between multiple data centers. We create and demonstrate original software to implement virtual network slicing and affinity policy-as-a-service offerings. Enhancements to synchronous storage backup; cloud exchanges; and Fibre Channel over Ethernet topologies are also discussed.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

Science.gov (United States)

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

International Nuclear Information System (INIS)

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

2004-01-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy
Online Fastbus processor for LEP

International Nuclear Information System (INIS)

Mueller, H.

1986-01-01

The author describes the online computing aspects of Fastbus systems using a processor module which has been developed at CERN and is now available commercially. These General Purpose Master/Slaves (GPMS) are based on 68000/10 (or optionally 68020/68881) processors. Applications include use as event-filters (DELPHI), supervisory controllers, Fastbus stand-alone diagnostic tools, and multiprocessor array components. The direct mapping of single, 32-bit assembly instructions to execute Fastbus protocols makes the use of a GPM both simple and flexible. Loosely coupled processing in Fastbus networks is possible between GPM's as they support access semaphores and use a two port memory as I/O buffer for Fastbus. Both master and slave-ports support block transfers up to 20 Mbytes/s. The CERN standard Fastbus software and the MoniCa symbolic debugging monitor are available on the GPM with real time, multiprocessing support. (Auth.)
Opto-VLSI-based reconfigurable free-space optical interconnects architecture

DEFF Research Database (Denmark)

Aljada, Muhsen; Alameh, Kamal; Chung, Il-Sug

2007-01-01

is the Opto-VLSI processor which can be driven by digital phase steering and multicasting holograms that reconfigure the optical interconnects between the input and output ports. The optical interconnects architecture is experimentally demonstrated at 2.5 Gbps using high-speed 1×3 VCSEL array and 1......×3 photoreceiver array in conjunction with two 1×4096 pixel Opto-VLSI processors. The minimisation of the crosstalk between the output ports is achieved by appropriately aligning the VCSEL and PD elements with respect to the Opto-VLSI processors and driving the latter with optimal steering phase holograms....
Computer architecture a quantitative approach

CERN Document Server

Hennessy, John L

2019-01-01

Computer Architecture: A Quantitative Approach, Sixth Edition has been considered essential reading by instructors, students and practitioners of computer design for over 20 years. The sixth edition of this classic textbook is fully revised with the latest developments in processor and system architecture. It now features examples from the RISC-V (RISC Five) instruction set architecture, a modern RISC instruction set developed and designed to be a free and openly adoptable standard. It also includes a new chapter on domain-specific architectures and an updated chapter on warehouse-scale computing that features the first public information on Google's newest WSC. True to its original mission of demystifying computer architecture, this edition continues the longstanding tradition of focusing on areas where the most exciting computing innovation is happening, while always keeping an emphasis on good engineering design.
A modular architecture for transparent computation in recurrent neural networks.

Science.gov (United States)

Carmantini, Giovanni S; Beim Graben, Peter; Desroches, Mathieu; Rodrigues, Serafim

2017-01-01

Computation is classically studied in terms of automata, formal languages and algorithms; yet, the relation between neural dynamics and symbolic representations and operations is still unclear in traditional eliminative connectionism. Therefore, we suggest a unique perspective on this central issue, to which we would like to refer as transparent connectionism, by proposing accounts of how symbolic computation can be implemented in neural substrates. In this study we first introduce a new model of dynamics on a symbolic space, the versatile shift, showing that it supports the real-time simulation of a range of automata. We then show that the Gödelization of versatile shifts defines nonlinear dynamical automata, dynamical systems evolving on a vectorial space. Finally, we present a mapping between nonlinear dynamical automata and recurrent artificial neural networks. The mapping defines an architecture characterized by its granular modularity, where data, symbolic operations and their control are not only distinguishable in activation space, but also spatially localizable in the network itself, while maintaining a distributed encoding of symbolic representations. The resulting networks simulate automata in real-time and are programmed directly, in the absence of network training. To discuss the unique characteristics of the architecture and their consequences, we present two examples: (i) the design of a Central Pattern Generator from a finite-state locomotive controller, and (ii) the creation of a network simulating a system of interactive automata that supports the parsing of garden-path sentences as investigated in psycholinguistics experiments. Copyright © 2016 Elsevier Ltd. All rights reserved.
Programming massively parallel processors a hands-on approach

CERN Document Server

Kirk, David B

2010-01-01

Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Point and track-finding processors for multiwire chambers

CERN Document Server

Hansroul, M

1973-01-01

The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...
Quantifying sleep architecture dynamics and individual differences using big data and Bayesian networks.

Science.gov (United States)

Yetton, Benjamin D; McDevitt, Elizabeth A; Cellini, Nicola; Shelton, Christian; Mednick, Sara C

2018-01-01

The pattern of sleep stages across a night (sleep architecture) is influenced by biological, behavioral, and clinical variables. However, traditional measures of sleep architecture such as stage proportions, fail to capture sleep dynamics. Here we quantify the impact of individual differences on the dynamics of sleep architecture and determine which factors or set of factors best predict the next sleep stage from current stage information. We investigated the influence of age, sex, body mass index, time of day, and sleep time on static (e.g. minutes in stage, sleep efficiency) and dynamic measures of sleep architecture (e.g. transition probabilities and stage duration distributions) using a large dataset of 3202 nights from a non-clinical population. Multi-level regressions show that sex effects duration of all Non-Rapid Eye Movement (NREM) stages, and age has a curvilinear relationship for Wake After Sleep Onset (WASO) and slow wave sleep (SWS) minutes. Bayesian network modeling reveals sleep architecture depends on time of day, total sleep time, age and sex, but not BMI. Older adults, and particularly males, have shorter bouts (more fragmentation) of Stage 2, SWS, and they transition less frequently to these stages. Additionally, we showed that the next sleep stage and its duration can be optimally predicted by the prior 2 stages and age. Our results demonstrate the potential benefit of big data and Bayesian network approaches in quantifying static and dynamic architecture of normal sleep.
Time analysis of interconnection network implemented on the honeycomb architecture

Energy Technology Data Exchange (ETDEWEB)

Milutinovic, D [Inst. Michael Pupin, Belgrade (Yugoslavia)

1996-12-31

Problems of time domains analysis of the mapping of interconnection networks for parallel processing on one form of uniform massively parallel architecture of the cellular type are considered. The results of time analysis are discussed. It is found that changing the technology results in changing the mapping rules. 17 refs.
Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture

Directory of Open Access Journals (Sweden)

Yassin El-Hillali

2009-01-01

Full Text Available This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC architecture optimized for Multiple Target Tracking (MTT in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets within its field of view. As the number of targets increases, the computational needs of the MTT system also increase making it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints.
Evaluation of MERIS Chlorophyll-a Retrieval Processors in a Complex Turbid Lake Kasumigaura over a 10-Year Mission

Directory of Open Access Journals (Sweden)

Salem Ibrahim Salem

2017-10-01

Full Text Available Abstract: The chlorophyll-a (Chla products of seven processors developed for the Medium Resolution Imaging Spectrometer (MERIS sensor were evaluated. The seven processors, based on a neural network and band height, were assessed over an optically complex water body with Chla concentrations of 8.10–187.40 mg∙m−3 using 10-year MERIS archival data. These processors were adopted for the Ocean and Land Color Instrument (OLCI sensor. Results indicated that the four processors of band height (i.e. the Maximum Chlorophyll Index (MCI_L1; and Fluorescence Line Height (FLH_L1; neural network (i.e. Eutrophic Lake (EUL; and Case 2 Regional (C2R possessed reasonable retrieval accuracy with root mean square error (R2 in the range of 0.42–0.65. However, these processors underestimated the retrieved Chla > 100 mg∙m−3, reflecting the limitation of the band height processors to eliminate the influence of non-phytoplankton matter and highlighting the need to train the neural network for highly turbid waters. MCI_L1 outperformed other processors during the calibration and validation stages (R2 = 0.65, Root mean square error (RMSE = 22.18 mg∙m−3, the mean absolute relative error (MARE = 36.88%. In contrast, the results from the Boreal Lake (BOL and Free University of Berlin (FUB processors demonstrated their inadequacy to accurately retrieve Chla concentration > 50 mg∙m−3, mainly due to the limitation of the training datasets that resulted in a high MARE for BOL (56.20% and FUB (57.00%. Mapping the spatial distribution of Chla concentrations across Lake Kasumigaura using the seven processors showed that all processors—except for the BOL and FUB—were able to accurately capture the Chla distribution for moderate and high Chla concentrations. In addition, MCI_L1 and C2R processors were evaluated over 10-years of monthly measured Chla as they demonstrated the best retrieval accuracy from both groups (i.e. band height and neural network
Analysis of the packet formation process in packet-switched networks

Science.gov (United States)

Meditch, J. S.

Two new queueing system models for the packet formation process in packet-switched telecommunication networks are developed, and their applications in process stability, performance analysis, and optimization studies are illustrated. The first, an M/M/1 queueing system characterization of the process, is a highly aggregated model which is useful for preliminary studies. The second, a marked extension of an earlier M/G/1 model, permits one to investigate stability, performance characteristics, and design of the packet formation process in terms of the details of processor architecture, and hardware and software implementations with processor structure and as many parameters as desired as variables. The two new models together with the earlier M/G/1 characterization span the spectrum of modeling complexity for the packet formation process from basic to advanced.

The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

2015-01-01

The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed on purpose to execute pattern matching with a high degree of parallelism. It finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. We report on the performance of the intermedia...
The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Andreani, A; The ATLAS collaboration; Beccherle, R; Beretta, M; Cipriani, R; Citraro, S; Citterio, M; Colombo, A; Crescioli, F; Dimas, D; Donati, S; Giannetti, P; Kordas, K; Lanza, A; Liberali, V; Luciano, P; Magalotti, D; Neroutsos, P; Nikolaidis, S; Piendibene, M; Sakellariou, A; Shojaii, S; Sotiropoulou, C-L; Stabile, A

2014-01-01

The Associative Memory (AM) system of the FTK processor has been designed to perform pattern matching using the hit information of the ATLAS silicon tracker. The AM is the heart of the FTK and it finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside the FTK, multiple designs and tests have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of the AM chip, an ASIC designed and optimized to perform pattern matching, and two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. Special relevance will be given to the AMchip design that includes two custom cells optimized for low consumption. We repo...
The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

2015-01-01

The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...
Software defined network architecture based research on load balancing strategy

Science.gov (United States)

You, Xiaoqian; Wu, Yang

2018-05-01

As a new type network architecture, software defined network has the key idea of separating the control place of the network from the transmission plane, to manage and control the network in a concentrated way; in addition, the network interface is opened on the control layer and the data layer, so as to achieve programmable control of the network. Considering that only the single shortest route is taken into the calculation of traditional network data flow transmission, and congestion and resource consumption caused by excessive load of link circuits are ignored, a link circuit load based flow media business QoS gurantee system is proposed in this article to divide the flow in the network into ordinary data flow and QoS flow. In this way, it supervises the link circuit load with the controller so as to calculate reasonable route rapidly and issue the flow table to the exchanger, to finish rapid data transmission. In addition, it establishes a simulation platform to acquire optimized result through simulation experiment.
Modeling, realization and evaluation of a parallel architecture for the data acquisition in multidetectors

International Nuclear Information System (INIS)

Guirande, Ph.; Aleonard, M-M.; Dien, Q-T.; Pedroza, J-L.

1997-01-01

The efficiency increasing in four π (EUROGAM, EUROBALL, DIAMANT) is achieved by an increase in the granularity, hence in the event counting rate in the acquisition system. Consequently, an evolution of the architecture of readout systems, coding and software is necessary. To achieve the required evaluation we have implemented a parallel architecture to check the quality of the events. The first application of this architecture was to make available an improved data acquisition system for the DIAMANT multidetector. The data acquisition system of DIAMANT is based on an ensemble of VME cards which must manage: the event readout, their salvation on magnetic support and histogram construction. The ensemble consists of processors distributed in a net, a workstation to control the experiment and a display system for spectra and arrays. In such architecture the task of VME bus becomes quickly a limitation for performances not only for the data transfer but also for coordination of different processors. The parallel architecture used makes the VME bus operation easy. It is based on three DSP C40 (Digital Signal Processor) implanted in a commercial (LSI) VME. It is provided with an external bus used to read the raw data from an interface card (ROCVI) between the 32 bit ECL bus reading the real time VME-based encoders. The performed tests have evidenced jamming after data exchanges between the processors using two communication lines. The analysis of this problem has indicated the necessity of dynamical changes of tasks to avoid this blocking. Intrinsic evaluation (i.e. without transfer on the VME bus) has been carried out for two parallel topologies (processor farm and tree). The simulation software permitted the generation of event packets. The obtained rates are sensibly equivalent (6 Mo/s) independent of topology. The farm topology has been chosen because it is simple to implant. The charge evaluation has reduced the rate in 'simplex' communication mode to 5.3 Mo/s and
Design and Architecture of SST-1 basic plasma control system

Energy Technology Data Exchange (ETDEWEB)

Patel, Kirit, E-mail: kpatel@ipr.res.in; Raju, D.; Dhongde, J.; Mahajan, K.; Chudasama, H.; Gulati, H.; Chauhan, A.; Masand, H.; Bhandarkar, M.; Pradhan, S.

2016-11-15

Highlights: • Reflective Memory network. • FPAG based Timing system for trigger distribution. • IRIG-B network for GPS time synchronization. • PMC based Digital Signal Processors and VME. • Simultaneous sampling ADC. - Abstract: Primary objective of SST-1 Plasma control system is to achieve Plasma position, shape and current profile control. Architecture of control system for SST-1 is distributed in nature. Fastest control loop time requirement of 100 μs is achieved using VME based simultaneous sampling ADCs, PMC based quad core DSP, Reflective Memory [RFM] based real-time network, VME based real-time trigger distribution network and Ethernet network. All the control loops for shape control, position control and current profile control share common signals from Magnetic diagnostic so it is planned to accommodate all the algorithms on the same PMC based quad core DSP module TS C-43. RFM based real-time data network replicate data from one node to next node in a ring network topology at sustained throughput rate of 13.4 MBps. Real-time Timing System network provides guaranteed trigger distribution in 3.8 μs from one node to all node of the network. Monitoring and configuration of different systems participating in the operation of SST-1 is done by Ethernet network. Magnetic sensors data is acquired using Pentek 6802 simultaneously sampling ADC card at the rate of 10KSPS. All the real-time raw data along with the control data will be archived using RFM network and SCSI HDD for the experiment duration of 1000 s. RFM network is also planned for real-time plotting of key parameter of Plasma during long experiment. After experiment this data is transferred to central storage server for archival purpose. This paper discusses the architecture and hardware implementation of the control system by describing all the involved hardware and software along with future plans for up-gradations.
Multi-Softcore Architecture on FPGA

Directory of Open Access Journals (Sweden)

Mouna Baklouti

2014-01-01

Full Text Available To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication.
Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

CERN Document Server

Biedermann, Alexander

2014-01-01

Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction
GEYSERS: a novel architecture for virtualization and co-provisioning of dynamic optical networks and IT services

NARCIS (Netherlands)

Escalona, E.; Peng, S.; Nejabati, R.; Simeonidou, D.; García-Espín, J.A.; Ferrer, J.; Figuerola, S.; Landi, G.; Ciulli, N.; Jiménez, J.; Belter, B.; Demchenko, Y.; de Laat, C.; Chen, X.; Yukan, A.; Soudan, S.; Vicat-Blanc, P.; Buysse, J.; de Leenheer, M.; Develder, C.; Tzanakaki, A.; Robinson, P.; Brogle, M.; Bohnert, T.M.

2011-01-01

GEYSERS aims at defining an end-to-end network architecture that offers a novel planning, provisioning and operational framework for optical network and IT infrastructure providers and operators. In this framework, physical infrastructure resources (network and IT) are dynamically partitioned to
Computer Sciences and Data Systems, volume 2

Science.gov (United States)

1987-01-01

Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
SANDS: a service-oriented architecture for clinical decision support in a National Health Information Network.

Science.gov (United States)

Wright, Adam; Sittig, Dean F

2008-12-01

In this paper, we describe and evaluate a new distributed architecture for clinical decision support called SANDS (Service-oriented Architecture for NHIN Decision Support), which leverages current health information exchange efforts and is based on the principles of a service-oriented architecture. The architecture allows disparate clinical information systems and clinical decision support systems to be seamlessly integrated over a network according to a set of interfaces and protocols described in this paper. The architecture described is fully defined and developed, and six use cases have been developed and tested using a prototype electronic health record which links to one of the existing prototype National Health Information Networks (NHIN): drug interaction checking, syndromic surveillance, diagnostic decision support, inappropriate prescribing in older adults, information at the point of care and a simple personal health record. Some of these use cases utilize existing decision support systems, which are either commercially or freely available at present, and developed outside of the SANDS project, while other use cases are based on decision support systems developed specifically for the project. Open source code for many of these components is available, and an open source reference parser is also available for comparison and testing of other clinical information systems and clinical decision support systems that wish to implement the SANDS architecture. The SANDS architecture for decision support has several significant advantages over other architectures for clinical decision support. The most salient of these are:
Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

Science.gov (United States)

Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

2017-12-01

As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
The Potential of the Cell Processor for Scientific Computing

Energy Technology Data Exchange (ETDEWEB)

Williams, Samuel; Shalf, John; Oliker, Leonid; Husbands, Parry; Kamil, Shoaib; Yelick, Katherine

2005-10-14

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of the using the forth coming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. We are the first to present quantitative Cell performance data on scientific kernels and show direct comparisons against leading superscalar (AMD Opteron), VLIW (IntelItanium2), and vector (Cray X1) architectures. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop both analytical models and simulators to predict kernel performance. Our work also explores the complexity of mapping several important scientific algorithms onto the Cells unique architecture. Additionally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.
Electromagnetic Physics Models for Parallel Computing Architectures

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip

DEFF Research Database (Denmark)

Bjerregaard, Tobias; Sparsø, Jens

2005-01-01

On-chip networks for future system-on-chip designs need simple, high performance implementations. In order to promote system-level integrity, guaranteed services (GS) need to be provided. We propose a network-on-chip (NoC) router architecture to support this, and demonstrate with a CMOS standard...... cell design. Our implementation is based on clockless circuit techniques, and thus inherently supports a modular, GALS-oriented design flow. Our router exploits virtual channels to provide connection-oriented GS, as well as connection-less best-effort (BE) routing. The architecture is highly flexible...
Ensuring Data Storage Security in Tree cast Routing Architecture for Sensor Networks

Science.gov (United States)

Kumar, K. E. Naresh; Sagar, U. Vidya; Waheed, Mohd. Abdul

2010-10-01

In this paper presents recent advances in technology have made low-cost, low-power wireless sensors with efficient energy consumption. A network of such nodes can coordinate among themselves for distributed sensing and processing of certain data. For which, we propose an architecture to provide a stateless solution in sensor networks for efficient routing in wireless sensor networks. This type of architecture is known as Tree Cast. We propose a unique method of address allocation, building up multiple disjoint trees which are geographically inter-twined and rooted at the data sink. Using these trees, routing messages to and from the sink node without maintaining any routing state in the sensor nodes is possible. In contrast to traditional solutions, where the IT services are under proper physical, logical and personnel controls, this routing architecture moves the application software and databases to the large data centers, where the management of the data and services may not be fully trustworthy. This unique attribute, however, poses many new security challenges which have not been well understood. In this paper, we focus on data storage security, which has always been an important aspect of quality of service. To ensure the correctness of users' data in this architecture, we propose an effective and flexible distributed scheme with two salient features, opposing to its predecessors. By utilizing the homomorphic token with distributed verification of erasure-coded data, our scheme achieves the integration of storage correctness insurance and data error localization, i.e., the identification of misbehaving server(s). Unlike most prior works, the new scheme further supports secure and efficient dynamic operations on data blocks, including: data update, delete and append. Extensive security and performance analysis shows that the proposed scheme is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server
A Novel Architectural Concept for Enhanced 5G Network Facilities

Directory of Open Access Journals (Sweden)

Chochliouros Ioannis P.

2017-01-01

Full Text Available The 5G ESSENCE project’s context is based on the concept of Edge Cloud Computing and Small Cell-as-a-Service (SCaaS -as both have been previously identified in the SESAME 5G-PPP project of phase 1- and further “promotes” their role and/or influences within the related 5G vertical markets. 5G ESSENCE’s core innovation is focused upon the development/provision of a highly flexible and scalable platform, offering benefits to the involved market actors. The present work identifies a variety of challenges to be fulfilled by the 5G ESSENCE, in the scope of an enhanced architectural framework. The proposed technical approach exploits the profits of the centralization of Small Cell functions as scale grows through an edge cloud environment, based on a two-tier architecture with the first distributed tier being for offering low latency services and the second centralized tier being for the provision of high processing power for computing-intensive network applications. This permits decoupling the control and user planes of the Radio Access Network (RAN and achieving the advantages of Cloud-RAN without the enormous fronthaul latency restrictions. The use of end-to-end network slicing mechanisms allows for sharing the related infrastructure among multiple operators/vertical industries and customizing its capabilities on a per-tenant basis, creating a neutral host market and reducing operational costs.
An Architecture to Manage Incoming Traffic of Inter-Domain Routing Using OpenFlow Networks

Directory of Open Access Journals (Sweden)

Walber José Adriano Silva

2018-04-01

Full Text Available The Border Gateway Protocol (BGP is the current state-of-the-art inter-domain routing between Autonomous Systems (ASes. Although BGP has different mechanisms to manage outbound traffic in an AS domain, it lacks an efficient tool for inbound traffic control from transit ASes such as Internet Service Providers (ISPs. For inter-domain routing, the BGP’s destination-based forwarding paradigm limits the granularity of distributing the network traffic among the multiple paths of the current Internet topology. Thus, this work offered a new architecture to manage incoming traffic in the inter-domain using OpenFlow networks. The architecture explored direct inter-domain communication to exchange control information and the functionalities of the OpenFlow protocol. Based on the achieved results of the size of exchanging messages, the proposed architecture is not only scalable, but also capable of performing load balancing for inbound traffic using different strategies.
An Architecture for Performance Optimization in a Collaborative Knowledge-Based Approach for Wireless Sensor Networks

Directory of Open Access Journals (Sweden)

Juan Ramon Velasco

2011-09-01

Full Text Available Over the past few years, Intelligent Spaces (ISs have received the attention of many Wireless Sensor Network researchers. Recently, several studies have been devoted to identify their common capacities and to set up ISs over these networks. However, little attention has been paid to integrating Fuzzy Rule-Based Systems into collaborative Wireless Sensor Networks for the purpose of implementing ISs. This work presents a distributed architecture proposal for collaborative Fuzzy Rule-Based Systems embedded in Wireless Sensor Networks, which has been designed to optimize the implementation of ISs. This architecture includes the following: (a an optimized design for the inference engine; (b a visual interface; (c a module to reduce the redundancy and complexity of the knowledge bases; (d a module to evaluate the accuracy of the new knowledge base; (e a module to adapt the format of the rules to the structure used by the inference engine; and (f a communications protocol. As a real-world application of this architecture and the proposed methodologies, we show an application to the problem of modeling two plagues of the olive tree: prays (olive moth, Prays oleae Bern. and repilo (caused by the fungus Spilocaea oleagina. The results show that the architecture presented in this paper significantly decreases the consumption of resources (memory, CPU and battery without a substantial decrease in the accuracy of the inferred values.
The TMS34010 graphic processor - an architecture for image visualization in NMR tomography

International Nuclear Information System (INIS)

Slaets, Jan Frans Willem; Paiva, Maria Stela Veludo de; Almeida, Lirio O.B.

1989-01-01

This abstract presents a description of the minimum system implemented with the graphic processor TMS34010, which will be used in the reconstruction, treatment and interpretation f images obtained by NMR tomography. The project is being developed in the LIE (Electronic Instrumentation Laboratory), of the Sao Carlos Chemistry and Physical Institute, S P, Brazil and is already in operation

Performance analysis of general purpose and digital signal processor kernels for heterogeneous systems-on-chip

Directory of Open Access Journals (Sweden)

T. von Sydow

2003-01-01

Full Text Available Various reasons like technology progress, flexibility demands, shortened product cycle time and shortened time to market have brought up the possibility and necessity to integrate different architecture blocks on one heterogeneous System-on-Chip (SoC. Architecture blocks like programmable processor cores (DSP- and GPP-kernels, embedded FPGAs as well as dedicated macros will be integral parts of such a SoC. Especially programmable architecture blocks and associated optimization techniques are discussed in this contribution. Design space exploration and thus the choice which architecture blocks should be integrated in a SoC is a challenging task. Crucial to this exploration is the evaluation of the application domain characteristics and the costs caused by individual architecture blocks integrated on a SoC. An ATE-cost function has been applied to examine the performance of the aforementioned programmable architecture blocks. Therefore, representative discrete devices have been analyzed. Furthermore, several architecture dependent optimization steps and their effects on the cost ratios are presented.
High performance deformable image registration algorithms for manycore processors

CERN Document Server

Shackleford, James; Sharp, Gregory

2013-01-01

High Performance Deformable Image Registration Algorithms for Manycore Processors develops highly data-parallel image registration algorithms suitable for use on modern multi-core architectures, including graphics processing units (GPUs). Focusing on deformable registration, we show how to develop data-parallel versions of the registration algorithm suitable for execution on the GPU. Image registration is the process of aligning two or more images into a common coordinate frame and is a fundamental step to be able to compare or fuse data obtained from different sensor measurements. E
Probabilistic programmable quantum processors

International Nuclear Information System (INIS)

Buzek, V.; Ziman, M.; Hillery, M.

2004-01-01

We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)
A FD/DAMA network architecture for the first generation land mobile satellite services

Science.gov (United States)

Yan, T.-Y.; Wang, C.; Cheng, U.; Dessouky, K.; Rafferty, W.

1989-01-01

A frequency division/demand assigned multiple access (FD/DAMA) network architecture for the first-generation land mobile satellite services is presented. Rationales and technical approaches are described. In this architecture, each mobile subscriber must follow a channel access protocol to make a service request to the network management center before transmission for either open-end or closed-end services. Open-end service requests will be processed on a blocked call cleared basis, while closed-end requests will be processed on a first-come-first-served basis. Two channel access protocols are investigated, namely, a recently proposed multiple channel collision resolution scheme which provides a significantly higher useful throughput, and the traditional slotted Aloha scheme. The number of channels allocated for either open-end or closed-end services can be adaptively changed according to aggregated traffic requests. Both theoretical and simulation results are presented. Theoretical results have been verified by simulation on the JPL network testbed.
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

OpenAIRE

Abdul Kareem PARCHUR; Ram Asaray SINGH

2012-01-01

High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...
The LASS hardware processor

International Nuclear Information System (INIS)

Kunz, P.F.

1976-01-01

The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)
Compilation Techniques Specific for a Hardware Cryptography-Embedded Multimedia Mobile Processor

Directory of Open Access Journals (Sweden)

Masa-aki FUKASE

2007-12-01

Full Text Available The development of single chip VLSI processors is the key technology of ever growing pervasive computing to answer overall demands for usability, mobility, speed, security, etc. We have so far developed a hardware cryptography-embedded multimedia mobile processor architecture, HCgorilla. Since HCgorilla integrates a wide range of techniques from architectures to applications and languages, one-sided design approach is not always useful. HCgorilla needs more complicated strategy, that is, hardware/software (H/S codesign. Thus, we exploit the software support of HCgorilla composed of a Java interface and parallelizing compilers. They are assumed to be installed in servers in order to reduce the load and increase the performance of HCgorilla-embedded clients. Since compilers are the essence of software's responsibility, we focus in this article on our recent results about the design, specifications, and prototyping of parallelizing compilers for HCgorilla. The parallelizing compilers are composed of a multicore compiler and a LIW compiler. They are specified to abstract parallelism from executable serial codes or the Java interface output and output the codes executable in parallel by HCgorilla. The prototyping compilers are written in Java. The evaluation by using an arithmetic test program shows the reasonability of the prototyping compilers compared with hand compilers.
Convolutional neural networks for event-related potential detection: impact of the architecture.

Science.gov (United States)

Cecotti, H

2017-07-01

The detection of brain responses at the single-trial level in the electroencephalogram (EEG) such as event-related potentials (ERPs) is a difficult problem that requires different processing steps to extract relevant discriminant features. While most of the signal and classification techniques for the detection of brain responses are based on linear algebra, different pattern recognition techniques such as convolutional neural network (CNN), as a type of deep learning technique, have shown some interests as they are able to process the signal after limited pre-processing. In this study, we propose to investigate the performance of CNNs in relation of their architecture and in relation to how they are evaluated: a single system for each subject, or a system for all the subjects. More particularly, we want to address the change of performance that can be observed between specifying a neural network to a subject, or by considering a neural network for a group of subjects, taking advantage of a larger number of trials from different subjects. The results support the conclusion that a convolutional neural network trained on different subjects can lead to an AUC above 0.9 by using an appropriate architecture using spatial filtering and shift invariant layers.
Implementation of an EPICS IOC on an Embedded Soft Core Processor Using Field Programmable Gate Arrays

International Nuclear Information System (INIS)

Douglas Curry; Alicia Hofler; Hai Dong; Trent Allison; J. Hovater; Kelly Mahoney

2005-01-01

At Jefferson Lab, we have been evaluating soft core processors running an EPICS IOC over μClinux on our custom hardware. A soft core processor is a flexible CPU architecture that is configured in the FPGA as opposed to a hard core processor which is fixed in silicon. Combined with an on-board Ethernet port, the technology incorporates the IOC and digital control hardware within a single FPGA. By eliminating the general purpose computer IOC, the designer is no longer tied to a specific platform, e.g. PC, VME, or VXI, to serve as the intermediary between the high level controls and the field hardware. This paper will discuss the design and development process as well as specific applications for JLab's next generation low-level RF controls and Machine Protection Systems
Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

Science.gov (United States)

Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

2014-03-01

The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
Floating point only SIMD instruction set architecture including compare, select, Boolean, and alignment operations

Science.gov (United States)

Gschwind, Michael K [Chappaqua, NY

2011-03-01

Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.
TopoGen: A Network Topology Generation Architecture with application to automating simulations of Software Defined Networks

CERN Document Server

Laurito, Andres; The ATLAS collaboration

2017-01-01

Simulation is an important tool to validate the performance impact of control decisions in Software Defined Networks (SDN). Yet, the manual modeling of complex topologies that may change often during a design process can be a tedious error-prone task. We present TopoGen, a general purpose architecture and tool for systematic translation and generation of network topologies. TopoGen can be used to generate network simulation models automatically by querying information available at diverse sources, notably SDN controllers. The DEVS modeling and simulation framework facilitates a systematic translation of structured knowledge about a network topology into a formal modular and hierarchical coupling of preexisting or new models of network entities (physical or logical). TopoGen can be flexibly extended with new parsers and generators to grow its scope of applicability. This permits to design arbitrary workflows of topology transformations. We tested TopoGen in a network engineering project for the ATLAS detector ...
TopoGen: A Network Topology Generation Architecture with application to automating simulations of Software Defined Networks

CERN Document Server

Laurito, Andres; The ATLAS collaboration

2018-01-01

Simulation is an important tool to validate the performance impact of control decisions in Software Defined Networks (SDN). Yet, the manual modeling of complex topologies that may change often during a design process can be a tedious error-prone task. We present TopoGen, a general purpose architecture and tool for systematic translation and generation of network topologies. TopoGen can be used to generate network simulation models automatically by querying information available at diverse sources, notably SDN controllers. The DEVS modeling and simulation framework facilitates a systematic translation of structured knowledge about a network topology into a formal modular and hierarchical coupling of preexisting or new models of network entities (physical or logical). TopoGen can be flexibly extended with new parsers and generators to grow its scope of applicability. This permits to design arbitrary workflows of topology transformations. We tested TopoGen in a network engineering project for the ATLAS detector ...
High speed vision processor with reconfigurable processing element array based on full-custom distributed memory

Science.gov (United States)

Chen, Zhe; Yang, Jie; Shi, Cong; Qin, Qi; Liu, Liyuan; Wu, Nanjian

2016-04-01

In this paper, a hybrid vision processor based on a compact full-custom distributed memory for near-sensor high-speed image processing is proposed. The proposed processor consists of a reconfigurable processing element (PE) array, a row processor (RP) array, and a dual-core microprocessor. The PE array includes two-dimensional processing elements with a compact full-custom distributed memory. It supports real-time reconfiguration between the PE array and the self-organized map (SOM) neural network. The vision processor is fabricated using a 0.18 µm CMOS technology. The circuit area of the distributed memory is reduced markedly into 1/3 of that of the conventional memory so that the circuit area of the vision processor is reduced by 44.2%. Experimental results demonstrate that the proposed design achieves correct functions.
Heterogeneous System Architectures from APUs to discrete GPUs

CERN Multimedia

CERN. Geneva

2013-01-01

We will present the Heterogeneous Systems Architectures that new AMD processors are bringing with the new GCN based GPUs and the new APUs. We will show how together they represent a huge step forward for programming flexibility and performance efficiently for Compute.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.

Science.gov (United States)

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-12-15

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
Architecture and performance of neural networks for efficient A/C control in buildings

International Nuclear Information System (INIS)

Mahmoud, Mohamed A.; Ben-Nakhi, Abdullatif E.

2003-01-01

The feasibility of using neural networks (NNs) for optimizing air conditioning (AC) setback scheduling in public buildings was investigated. The main focus is on optimizing the network architecture in order to achieve best performance. To save energy, the temperature inside public buildings is allowed to rise after business hours by setting back the thermostat. The objective is to predict the time of the end of thermostat setback (EoS) such that the design temperature inside the building is restored in time for the start of business hours. State of the art building simulation software, ESP-r, was used to generate a database that covered the years 1995-1999. The software was used to calculate the EoS for two office buildings using the climate records in Kuwait. The EoS data for 1995 and 1996 were used for training and testing the NNs. The robustness of the trained NN was tested by applying them to a 'production' data set (1997-1999), which the networks have never 'seen' before. For each of the six different NN architectures evaluated, parametric studies were performed to determine the network parameters that best predict the EoS. External hourly temperature readings were used as network inputs, and the thermostat end of setback (EoS) is the output. The NN predictions were improved by developing a neural control scheme (NC). This scheme is based on using the temperature readings as they become available. For each NN architecture considered, six NNs were designed and trained for this purpose. The performance of the NN analysis was evaluated using a statistical indicator (the coefficient of multiple determination) and by statistical analysis of the error patterns, including ANOVA (analysis of variance). The results show that the NC, when used with a properly designed NN, is a powerful instrument for optimizing AC setback scheduling based only on external temperature records
Metrics of brain network architecture capture the impact of disease in children with epilepsy

Directory of Open Access Journals (Sweden)

Michael J. Paldino

2017-01-01

Conclusions: We observed that a machine learning algorithm accurately predicted epilepsy duration based on global metrics of network architecture derived from resting state fMRI. These findings suggest that network metrics have the potential to form the basis for statistical models that translate quantitative imaging data into patient-level markers of cognitive deterioration.
The breaking point of modern processor and platform technology

CERN Document Server

Nowak, A; Lazzaro, A; Leduc, J

2011-01-01

This work is an overview of state of the art processors used in High Energy Physics, their architecture and an extensive outline of the forthcoming technologies. Silicon process science and hardware design are making constant and rapid progress, and a solid grasp of these developments is imperative to the understanding of their possible future applications, which might include software strategy, optimizations, computing center operations and hardware acquisitions. In particular, the current issue of software and platform scalability is becoming more and more noticeable, and will develop in the near future with the growing core count of single chips and the approach of certain x86 architectural limits. Other topics brought forward include the hard, physical limits of innovation, the applicability of tried and tested computing formulas to modern technologies, as well as an analysis of viable alternate choices for continued development.
Electromagnetic Physics Models for Parallel Computing Architectures

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

2016-01-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

Some links on this page may take you to non-federal websites. Their policies may differ from this site.