WorldWideScience

Sample records for architecture parallele pour

  1. Acoustic simulation in architecture with parallel algorithm

    Science.gov (United States)

    Li, Xiaohong; Zhang, Xinrong; Li, Dan

    2004-03-01

    In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

  2. Exploiting Symmetry on Parallel Architectures.

    Science.gov (United States)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  3. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  4. Distributed Parallel Architecture for "Big Data"

    Directory of Open Access Journals (Sweden)

    Catalin BOJA

    2012-01-01

    Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.

  5. Electromagnetic Physics Models for Parallel Computing Architectures

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  6. Electromagnetic Physics Models for Parallel Computing Architectures

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  7. Algorithmes et architectures pour ordinateurs quantiques supraconducteurs

    Science.gov (United States)

    Blais, A.

    2003-09-01

    Algorithms and architectures for superconducting quantum computers Since its formulation, information theory was based, implicitly, on the laws of classical physics. Such a formulation is however incomplete because it does not take into account quantum reality. During the last twenty years, expansion of theory information to include quantum effects has known growing interest. The practical realization of a system for quantum data processing system, a quantum computer, presents however many challenges. In this book, we are interested in various aspects of these challenges. We start by presenting algorithmic concepts like optimization of quantum computations and geometric quantum computation. We then consider various designs and aspects of qubits based on Josephson junctions. In particular, an original approach to the interaction between superconducting qubits is presented. This approach is very general since it can be applied to various designs of qubits. Finally, we are interested in read-out of the superconductic flux qubits. The detector suggested here has the advantage that it is possible to uncouple it from the qubit when no measurement is in progress. Depuis sa formulation, la théorie de l'information a été basée, implicitement, sur les lois de la physique classique. Une telle formulation est toutefois incomplète puisqu'elle ne tient pas compte de la réalité quantique. Au cours des vingt dernières années, l'expansion de la théorie de l'information, de façon à englober les effets purement quantiques, a connu un intérêt grandissant. La réalisation d'un système de traitement de l'information quantique, un ordinateur quantique, présente toutefois de nombreux défis. Dans cet ouvrage, on s'intéresse à différents aspects concernant ces défis. On commence par présenter des concepts algorithmiques comme l'optimisation de calculs quantiques et le calcul quantique géométrique. Par la suite, on s'intéresse à différents designs et aspects de l

  8. Capital Architecture: Situating symbolism parallel to architectural methods and technology

    Science.gov (United States)

    Daoud, Bassam

    Capital Architecture is a symbol of a nation's global presence and the cultural and social focal point of its inhabitants. Since the advent of High-Modernism in Western cities, and subsequently decolonised capitals, civic architecture no longer seems to be strictly grounded in the philosophy that national buildings shape the legacy of government and the way a nation is regarded through its built environment. Amidst an exceedingly globalized architectural practice and with the growing concern of key heritage foundations over the shortcomings of international modernism in representing its immediate socio-cultural context, the contextualization of public architecture within its sociological, cultural and economic framework in capital cities became the key denominator of this thesis. Civic architecture in capital cities is essential to confront the challenges of symbolizing a nation and demonstrating the legitimacy of the government'. In today's dominantly secular Western societies, governmental architecture, especially where the seat of political power lies, is the ultimate form of architectural expression in conveying a sense of identity and underlining a nation's status. Departing with these convictions, this thesis investigates the embodied symbolic power, the representative capacity, and the inherent permanence in contemporary architecture, and in its modes of production. Through a vast study on Modern architectural ideals and heritage -- in parallel to methodologies -- the thesis stimulates the future of large scale governmental building practices and aims to identify and index the key constituents that may respond to the lack representation in civic architecture in capital cities.

  9. Introduction to parallel algorithms and architectures arrays, trees, hypercubes

    CERN Document Server

    Leighton, F Thomson

    1991-01-01

    Introduction to Parallel Algorithms and Architectures: Arrays Trees Hypercubes provides an introduction to the expanding field of parallel algorithms and architectures. This book focuses on parallel computation involving the most popular network architectures, namely, arrays, trees, hypercubes, and some closely related networks.Organized into three chapters, this book begins with an overview of the simplest architectures of arrays and trees. This text then presents the structures and relationships between the dominant network architectures, as well as the most efficient parallel algorithms for

  10. The new landscape of parallel computer architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

    2007-07-15

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  11. The new landscape of parallel computer architecture

    International Nuclear Information System (INIS)

    Shalf, John

    2007-01-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models

  12. Kalman Filter Tracking on Parallel Architectures

    International Nuclear Information System (INIS)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2016-01-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment

  13. High Efficiency EBCOT with Parallel Coding Architecture for JPEG2000

    Directory of Open Access Journals (Sweden)

    Chiang Jen-Shiun

    2006-01-01

    Full Text Available This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ-coder for the embedded block coding (EBCOT unit of the JPEG2000 encoder. Tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system. The proposed parallel architecture can increase the throughput rate of the context modeling. To match the high throughput rate of the parallel context-modeling architecture, an efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 180 MHz to encode one symbol each cycle. Compared with the previous context-modeling architectures, our parallel architectures can improve the throughput rate up to 25%.

  14. Programming parallel architectures - The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  15. Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Science.gov (United States)

    Cezzar, Ruknet

    1993-01-01

    The project's main contributions have been in the area of student support. Throughout the project, at least one, in some cases two, undergraduate students have been supported. By working with the project, these students gained valuable knowledge involving the scientific research project, including the not-so-pleasant reporting requirements to the funding agencies. The other important contribution was towards the establishment of a graduate program in computer science at Hampton University. Primarily, the PAPER project has served as the main research basis in seeking funds from other agencies, such as the National Science Foundation, for establishing a research infrastructure in the department. In technical areas, especially in the first phase, we believe the trip to Jet Propulsion Laboratory, and gathering together all the pertinent information involving experimental computer architectures aimed for planetary explorations was very helpful. Indeed, if this effort is to be revived in the future due to congressional funding for planetary explorations, say an unmanned mission to Mars, our interim report will be an important starting point. In other technical areas, our simulator has pinpointed and highlighted several important performance issues related to the design of operating system kernels for MIMD machines. In particular, the critical issue of how the kernel itself will run in parallel on a multiple-processor system has been addressed through the various ready list organization and access policies. In the area of neural computing, our main contribution was an introductory tutorial package to familiarize the researchers at NASA with this new and promising field zone axes (20). Finally, we have introduced the notion of reversibility in programming systems which may find applications in various areas of space research.

  16. Programming parallel architectures: The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  17. Mapping robust parallel multigrid algorithms to scalable memory architectures

    Science.gov (United States)

    Overman, Andrea; Vanrosendale, John

    1993-01-01

    The convergence rate of standard multigrid algorithms degenerates on problems with stretched grids or anisotropic operators. The usual cure for this is the use of line or plane relaxation. However, multigrid algorithms based on line and plane relaxation have limited and awkward parallelism and are quite difficult to map effectively to highly parallel architectures. Newer multigrid algorithms that overcome anisotropy through the use of multiple coarse grids rather than relaxation are better suited to massively parallel architectures because they require only simple point-relaxation smoothers. In this paper, we look at the parallel implementation of a V-cycle multiple semicoarsened grid (MSG) algorithm on distributed-memory architectures such as the Intel iPSC/860 and Paragon computers. The MSG algorithms provide two levels of parallelism: parallelism within the relaxation or interpolation on each grid and across the grids on each multigrid level. Both levels of parallelism must be exploited to map these algorithms effectively to parallel architectures. This paper describes a mapping of an MSG algorithm to distributed-memory architectures that demonstrates how both levels of parallelism can be exploited. The result is a robust and effective multigrid algorithm for distributed-memory machines.

  18. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    Science.gov (United States)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  19. Customizable Memory Schemes for Data Parallel Architectures

    NARCIS (Netherlands)

    Gou, C.

    2011-01-01

    Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses

  20. Parallel generation of architecture on the GPU

    KAUST Repository

    Steinberger, Markus; Kenzel, Michael; Kainz, Bernhard K.; Mü ller, Jö rg; Wonka, Peter; Schmalstieg, Dieter

    2014-01-01

    they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies

  1. Kalman filter tracking on parallel architectures

    Science.gov (United States)

    Cerati, G.; Elmer, P.; Krutelyov, S.; Lantz, S.; Lefebvre, M.; McDermott, K.; Riley, D.; Tadel, M.; Wittich, P.; Wurthwein, F.; Yagil, A.

    2017-10-01

    We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on Nvidia GPUs.

  2. Parallel generation of architecture on the GPU

    KAUST Repository

    Steinberger, Markus

    2014-05-01

    In this paper, we present a novel approach for the parallel evaluation of procedural shape grammars on the graphics processing unit (GPU). Unlike previous approaches that are either limited in the kind of shapes they allow, the amount of parallelism they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies required for context-sensitive evaluation, and introduce intra-rule parallelism. Our rule scheduling scheme avoids unnecessary back and forth between CPU and GPU and reduces round trips to slow global memory by dynamically grouping rules in on-chip shared memory. Our GPU shape grammar implementation is multiple orders of magnitude faster than the standard in CPU-based rule evaluation, while offering equal expressive power. In comparison to the state of the art in GPU shape grammar derivation, our approach is nearly 50 times faster, while adding support for geometric context-sensitivity. © 2014 The Author(s) Computer Graphics Forum © 2014 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.

  3. Urban Combat Advanced Training Technology Architecture (Architecture de technologie avancee pour l’entrainement au combat urbain)

    Science.gov (United States)

    2018-01-01

    NORTH ATLANTIC TREATY ORGANIZATION SCIENCE AND TECHNOLOGY ORGANIZATION AC/323(MSG-098)TP/740 www.sto.nato.int STO TECHNICAL...REPORT TR-MSG-098 Urban Combat Advanced Training Technology Architecture (Architecture de technologie avancée pour l’entraînement au combat urbain...NORTH ATLANTIC TREATY ORGANIZATION SCIENCE AND TECHNOLOGY ORGANIZATION AC/323(MSG-098)TP/740 www.sto.nato.int STO TECHNICAL

  4. Parallel algorithms and architecture for computation of manipulator forward dynamics

    Science.gov (United States)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.

  5. Construction Morphology and the Parallel Architecture of Grammar

    Science.gov (United States)

    Booij, Geert; Audring, Jenny

    2017-01-01

    This article presents a systematic exposition of how the basic ideas of Construction Grammar (CxG) (Goldberg, 2006) and the Parallel Architecture (PA) of grammar (Jackendoff, 2002]) provide the framework for a proper account of morphological phenomena, in particular word formation. This framework is referred to as Construction Morphology (CxM). As…

  6. A parallel architecture for digital filtering using Fermat number transforms

    Science.gov (United States)

    Truong, T. K.; Reed, I. S.; Yeh, C.-S.; Shao, H. M.

    1983-01-01

    In this correspondence, a parallel architecture is developed to compute the linear convolution of two sequences of arbitrary lengths using the Fermat number transform (FNT). In particular, a pipeline structure is designed to compute a 128-point FNT. In this FNT, only additions and bit rotations are required. The overlap-save method is generalized for the FNT to realize a digital filter of arbitrary length. The generalized overlap-save method alleviates the usual dynamic range limitation of FNT's of long transform lengths. A parallel architecture is developed to realize this type of overlap-save method using one FNT and several inverse FNT's of 128 points. Its architecture is regular, simple, and flexible, and therefore naturally suitable for VLSI implementation.

  7. High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Directory of Open Access Journals (Sweden)

    H. Y. Su

    2012-04-01

    Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.

  8. A Parallel Saturation Algorithm on Shared Memory Architectures

    Science.gov (United States)

    Ezekiel, Jonathan; Siminiceanu

    2007-01-01

    Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.

  9. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  10. Parallel PDE-Based Simulations Using the Common Component Architecture

    International Nuclear Information System (INIS)

    McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

    2006-01-01

    The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of component based software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and general purpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications

  11. Multiscale Architectures and Parallel Algorithms for Video Object Tracking

    Science.gov (United States)

    2011-10-01

    larger number of cores using the IBM QS22 Blade for handling higher video processing workloads (but at higher cost per core), low power consumption and...Cell/B.E. Blade processors which have a lot more main memory but also higher power consumption . More detailed performance figures for HD and SD video...Parallelism in Algorithms and Architectures, pages 289–298, 2007. [3] S. Ali and M. Shah. COCOA - Tracking in aerial imagery. In Daniel J. Henry

  12. Application of parallelized software architecture to an autonomous ground vehicle

    Science.gov (United States)

    Shakya, Rahul; Wright, Adam; Shin, Young Ho; Momin, Orko; Petkovsek, Steven; Wortman, Paul; Gautam, Prasanna; Norton, Adam

    2011-01-01

    This paper presents improvements made to Q, an autonomous ground vehicle designed to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2010 IGVC, Q was upgraded with a new parallelized software architecture and a new vision processor. Improvements were made to the power system reducing the number of batteries required for operation from six to one. In previous years, a single state machine was used to execute the bulk of processing activities including sensor interfacing, data processing, path planning, navigation algorithms and motor control. This inefficient approach led to poor software performance and made it difficult to maintain or modify. For IGVC 2010, the team implemented a modular parallel architecture using the National Instruments (NI) LabVIEW programming language. The new architecture divides all the necessary tasks - motor control, navigation, sensor data collection, etc. into well-organized components that execute in parallel, providing considerable flexibility and facilitating efficient use of processing power. Computer vision is used to detect white lines on the ground and determine their location relative to the robot. With the new vision processor and some optimization of the image processing algorithm used last year, two frames can be acquired and processed in 70ms. With all these improvements, Q placed 2nd in the autonomous challenge.

  13. Hybrid parallel computing architecture for multiview phase shifting

    Science.gov (United States)

    Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

    2014-11-01

    The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.

  14. Enhanced memory architecture for massively parallel vision chip

    Science.gov (United States)

    Chen, Zhe; Yang, Jie; Liu, Liyuan; Wu, Nanjian

    2015-04-01

    Local memory architecture plays an important role in high performance massively parallel vision chip. In this paper, we propose an enhanced memory architecture with compact circuit area designed in a full-custom flow. The memory consists of separate master-stage static latches and shared slave-stage dynamic latches. We use split transmission transistors on the input data path to enhance tolerance for charge sharing and to achieve random read/write capabilities. The memory is designed in a 0.18 μm CMOS process. The area overhead of the memory achieves 16.6 μm2/bit. Simulation results show that the maximum operating frequency reaches 410 MHz and the corresponding peak dynamic power consumption for a 64-bit memory unit is 190 μW under 1.8 V supply voltage.

  15. Parallel k-means++ for Multiple Shared-Memory Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Mackey, Patrick S.; Lewis, Robert R.

    2016-09-22

    In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varying data sizes.

  16. A novel parallel architecture for local histogram equalization

    Science.gov (United States)

    Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan

    2005-07-01

    Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.

  17. Analysis of Parallel Burn Without Crossfeed TSTO RLV Architectures and Comparison to Parallel Burn With Crossfeed and Series Burn Architectures

    Science.gov (United States)

    Smith, Garrett; Phillips, Alan

    2002-01-01

    There are currently three dominant TSTO class architectures. These are Series Burn (SB), Parallel Burn with crossfeed (PBw/cf), and Parallel Burn without crossfeed (PBncf). The goal of this study was to determine what factors uniquely affect PBncf architectures, how each of these factors interact, and to determine from a performance perspective whether a PBncf vehicle could be competitive with a PBw/cf or SB vehicle using equivalent technology and assumptions. In all cases, performance was evaluated on a relative basis for a fixed payload and mission by comparing gross and dry vehicle masses of a closed vehicle. Propellant combinations studied were LOX: LH2 propelled orbiter and booster (HH) and LOX: Kerosene booster with LOX: LH2 orbiter (KH). The study conclusions were: 1) a PBncf orbiter should be throttled as deeply as possible after launch until the staging point. 2) a detailed structural model is essential to accurate architecture analysis and evaluation. 3) a PBncf TSTO architecture is feasible for systems that stage at mach 7. 3a) HH architectures can achieve a mass growth relative to PBw/cf of ratio and to the position of the orbiter required to align the nozzle heights at liftoff. 5 ) thrust to weight ratios of 1.3 at liftoff and between 1.0 and 0.9 when staging at mach 7 appear to be close to ideal for PBncf vehicles. 6) performance for all vehicles studied is better when staged at mach 7 instead of mach 5. The study showed that a Series Burn architecture has the lowest gross mass for HH cases, and has the lowest dry mass for KH cases. The potential disadvantages of SB are the required use of an air-start for the orbiter engines and potential CG control issues. A Parallel Burn with crossfeed architecture solves both these problems, but the mechanics of a large bipropellant crossfeed system pose significant technical difficulties. Parallel Burn without crossfeed vehicles start both booster and orbiter engines on the ground and thus avoid both the risk of

  18. Centaure: an heterogeneous parallel architecture for computer vision

    International Nuclear Information System (INIS)

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  19. QR-decomposition based SENSE reconstruction using parallel architecture.

    Science.gov (United States)

    Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad

    2018-04-01

    Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.

  20. A parallel-pipelined architecture for a multi carrier demodulator

    Science.gov (United States)

    Kwatra, S. C.; Jamali, M. M.; Eugene, Linus P.

    1991-03-01

    Analog devices have been used for processing the information on board the satellites. Presently, digital devices are being used because they are economical and flexible as compared to their analog counterparts. Several schemes of digital transmission can be used depending on the data rate requirement of the user. An economical scheme of transmission for small earth stations uses single channel per carrier/frequency division multiple access (SCPC/FDMA) on the uplink and time division multiplexing (TDM) on the downlink. This is a typical communication service offered to low data rate users in commercial mass market. These channels usually pertain to either voice or data transmission. An efficient digital demodulator architecture is provided for a large number of law data rate users. A demodulator primarily consists of carrier, clock, and data recovery modules. This design uses principles of parallel processing, pipelining, and time sharing schemes to process large numbers of voice or data channels. It maintains the optimum throughput which is derived from the designed architecture and from the use of high speed components. The design is optimized for reduced power and area requirements. This is essential for satellite applications. The design is also flexible in processing a group of a varying number of channels. The algorithms that are used are verified by the use of a computer aided software engineering (CASE) tool called the Block Oriented System Simulator. The data flow, control circuitry, and interface of the hardware design is simulated in C language. Also, a multiprocessor approach is provided to map, model, and simulate the demodulation algorithms mainly from a speed view point. A hypercude based architecture implementation is provided for such a scheme of operation. The hypercube structure and the demodulation models on hypercubes are simulated in Ada.

  1. A multimodal parallel architecture: A cognitive framework for multimodal interactions.

    Science.gov (United States)

    Cohn, Neil

    2016-01-01

    Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech-gesture and text-image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated "grammatical" sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff's (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it "dominates" the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the "linguistic" system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality. Copyright © 2015.

  2. Design of a Load-Balancing Architecture For Parallel Firewalls

    National Research Council Canada - National Science Library

    Joyner, William

    1999-01-01

    .... This thesis proposes a load-balancing firewall architecture to meet the Navy's needs. It first conducts an architectural analysis of the problem and then presents a high-level system design as a solution...

  3. Parallel Application Development Using Architecture View Driven Model Transformations

    NARCIS (Netherlands)

    Arkin, E.; Tekinerdogan, B.

    2015-01-01

    o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity

  4. Méthodes numériques pour les plasmas sur architectures multicoeurs

    OpenAIRE

    Massaro , Michel

    2016-01-01

    This thesis deals with the resolution of the Magneto-Hydro-Dynamic (MHD) system on massively parallel architectures. This problem is an hyperbolic system of conservation laws. For cost reasons in terms of time and space, we use the finite volume method. These criteria are particularly important in the case of MHD because the solutions obtained may have many shock waves and be very turbulent. The approach of a physical phenomenon requires working on a fine mesh which involves a large quantity ...

  5. Performance evaluation for compressible flow calculations on five parallel computers of different architectures

    International Nuclear Information System (INIS)

    Kimura, Toshiya.

    1997-03-01

    A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)

  6. Parallel-Architecture Simulator Development Using Hardware Transactional Memory

    OpenAIRE

    Armejach Sanosa, Adrià

    2009-01-01

    To address the need for a simpler parallel programming model, Transactional Memory (TM) has been developed and promises good parallel performance with easy-to-write parallel code. Unlike lock-based approaches, with TM, programmers do not need to explicitly specify and manage the synchronization among threads. However, programmers simply mark code segments as transactions, and the TM system manages the concurrency control for them. TM can be implemented either in software (STM) or hardware (HT...

  7. Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction

    KAUST Repository

    Ltaief, Hatem; Luszczek, Piotr R.; Dongarra, Jack

    2012-01-01

    The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using

  8. Migrating to a real-time distributed parallel simulator architecture

    CSIR Research Space (South Africa)

    Duvenhage, B

    2007-07-01

    Full Text Available A legacy non-distributed logical time simulator is migrated to a distributed architecture to parallelise execution. The existing Discrete Time System Specification (DTSS) modelling formalism is retained to simplify the reuse of existing models...

  9. Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

    Science.gov (United States)

    Jones, W. H.

    1983-01-01

    The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.

  10. Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

    Science.gov (United States)

    Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

    2011-01-01

    Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.

  11. Analysis of multigrid methods on massively parallel computers: Architectural implications

    Science.gov (United States)

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  12. Parallel Programming using OpenCL on Modern Architectures

    DEFF Research Database (Denmark)

    Nielsen, Allan Svejstrup; Engsig-Karup, Allan Peter; Dammann, Bernd

    as they are at graphics. To conclude the presentation of OpenCL as a language for compute, a matrix-matrix multiplication example is devised and optimized for the VLIW4, Tesla and Fermi architectures. The performance is measured as a function of both matrix and work-group size and results are discussed. Where applicable...

  13. Network Coding Parallelization Based on Matrix Operations for Multicore Architectures

    DEFF Research Database (Denmark)

    Wunderlich, Simon; Cabrera, Juan; Fitzek, Frank

    2015-01-01

    such as the Raspberry Pi2 with four cores in the order of up to one full magnitude. The speed increase gain is even higher than the number of cores of the Raspberry Pi2 since the newly introduced approach exploits the cache architecture way better than by-the-book matrix operations. Copyright © 2015 by the Institute...

  14. Particle In Cell Codes on Highly Parallel Architectures

    Science.gov (United States)

    Tableman, Adam

    2014-10-01

    We describe strategies and examples of Particle-In-Cell Codes running on Nvidia GPU and Intel Phi architectures. This includes basic implementations in skeletons codes and full-scale development versions (encompassing 1D, 2D, and 3D codes) in Osiris. Both the similarities and differences between Intel's and Nvidia's hardware will be examined. Work supported by grants NSF ACI 1339893, DOE DE SC 000849, DOE DE SC 0008316, DOE DE NA 0001833, and DOE DE FC02 04ER 54780.

  15. PEM-PCA: A Parallel Expectation-Maximization PCA Face Recognition Architecture

    Directory of Open Access Journals (Sweden)

    Kanokmon Rujirakul

    2014-01-01

    Full Text Available Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages’ complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

  16. PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

    Science.gov (United States)

    Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

    2014-01-01

    Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

  17. A portable implementation of ARPACK for distributed memory parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Maschhoff, K.J.; Sorensen, D.C.

    1996-12-31

    ARPACK is a package of Fortran 77 subroutines which implement the Implicitly Restarted Arnoldi Method used for solving large sparse eigenvalue problems. A parallel implementation of ARPACK is presented which is portable across a wide range of distributed memory platforms and requires minimal changes to the serial code. The communication layers used for message passing are the Basic Linear Algebra Communication Subprograms (BLACS) developed for the ScaLAPACK project and Message Passing Interface(MPI).

  18. Algorithms for parallel flow solvers on message passing architectures

    Science.gov (United States)

    Vanderwijngaart, Rob F.

    1995-01-01

    The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those

  19. Fully Pipelined Parallel Architecture for Candidate Block and Pixel-Subsampling-Based Motion Estimation

    Directory of Open Access Journals (Sweden)

    Reeba Korah

    2008-01-01

    Full Text Available This paper presents a low power and high speed architecture for motion estimation with Candidate Block and Pixel Subsampling (CBPS Algorithm. Coarse-to-fine search approach is employed to find the motion vector so that the local minima problem is totally eliminated. Pixel subsampling is performed in the selected candidate blocks which significantly reduces computational cost with low quality degradation. The architecture developed is a fully pipelined parallel design with 9 processing elements. Two different methods are deployed to reduce the power consumption, parallel and pipelined implementation and parallel accessing to memory. For processing 30 CIF frames per second our architecture requires a clock frequency of 4.5 MHz.

  20. Applications of parallel computer architectures to the real-time simulation of nuclear power systems

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1988-01-01

    In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area

  1. Time-dependent deterministic transport on parallel architectures using PARTISN

    International Nuclear Information System (INIS)

    Alcouffe, R.E.; Baker, R.S.

    1998-01-01

    In addition to the ability to solve the static transport equation, the authors have also incorporated time dependence into the parallel S N code PARTISN. Using a semi-implicit scheme, PARTISN is capable of performing time-dependent calculations for both fissioning and pure source driven problems. They have applied this to various types of problems such as shielding and prompt fission experiments. This paper describes the form of the time-dependent equations implemented, their solution strategies in PARTISN including iteration acceleration, and the strategies used for time-step control. Results are presented for a iron-water shielding calculation and a criticality excursion in a uranium solution configuration

  2. Traditional Tracking with Kalman Filter on Parallel Architectures

    Science.gov (United States)

    Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; MacNeill, Ian; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2015-05-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this, we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The most common track finding techniques in use today are however those based on the Kalman Filter. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. We report the results of our investigations into the potential and limitations of these algorithms on the new parallel hardware.

  3. An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

    International Nuclear Information System (INIS)

    Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

    2003-01-01

    The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)

  4. A parallel 3-D discrete wavelet transform architecture using pipelined lifting scheme approach for video coding

    Science.gov (United States)

    Hegde, Ganapathi; Vaya, Pukhraj

    2013-10-01

    This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9, 7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65 nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486 MHz with a power consumption of 2.56 mW. This architecture is suitable for real-time video compression even with large frame dimensions.

  5. Transformation-based exploration of data parallel architecture for customizable hardware : a JPEG encoder case study

    NARCIS (Netherlands)

    Corvino, R.; Diken, E.; Gamatié, A.; Jozwiak, L.

    2012-01-01

    In this paper, we present a method for the design of MPSoCs for complex data-intensive applications. This method aims at a blend exploration of the communication, the memory system architecture and the computation resource parallelism. The proposed method is exemplified on a JPEG Encoder case study

  6. A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

    2008-01-01

    This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...

  7. A learnable parallel processing architecture towards unity of memory and computing.

    Science.gov (United States)

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  8. A learnable parallel processing architecture towards unity of memory and computing

    Science.gov (United States)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  9. Demonstration of an optoelectronic interconnect architecture for a parallel modified signed-digit adder and subtracter

    Science.gov (United States)

    Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.

    1996-06-01

    A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.

  10. A parallel VLSI architecture for a digital filter of arbitrary length using Fermat number transforms

    Science.gov (United States)

    Truong, T. K.; Reed, I. S.; Yeh, C. S.; Shao, H. M.

    1982-01-01

    A parallel architecture for computation of the linear convolution of two sequences of arbitrary lengths using the Fermat number transform (FNT) is described. In particular a pipeline structure is designed to compute a 128-point FNT. In this FNT, only additions and bit rotations are required. A standard barrel shifter circuit is modified so that it performs the required bit rotation operation. The overlap-save method is generalized for the FNT to compute a linear convolution of arbitrary length. A parallel architecture is developed to realize this type of overlap-save method using one FNT and several inverse FNTs of 128 points. The generalized overlap save method alleviates the usual dynamic range limitation in FNTs of long transform lengths. Its architecture is regular, simple, and expandable, and therefore naturally suitable for VLSI implementation.

  11. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    Science.gov (United States)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

  12. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

    International Nuclear Information System (INIS)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

    2004-01-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy

  13. Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

    International Nuclear Information System (INIS)

    Longoni, G.

    2010-01-01

    The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)

  14. The effect of earthquake on architecture geometry with non-parallel system irregularity configuration

    Science.gov (United States)

    Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri

    2017-12-01

    Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a

  15. An information-theoretic approach to motor action decoding with a reconfigurable parallel architecture.

    Science.gov (United States)

    Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C

    2011-01-01

    Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.

  16. Analysis of clinical complication data for radiation hepatitis using a parallel architecture model

    International Nuclear Information System (INIS)

    Jackson, A.; Haken, R.K. ten; Robertson, J.M.; Kessler, M.L.; Kutcher, G.J.; Lawrence, T.S.

    1995-01-01

    Purpose: The detailed knowledge of dose volume distributions available from the three-dimensional (3D) conformal radiation treatment of tumors in the liver (reported elsewhere) offers new opportunities to quantify the effect of volume on the probability of producing radiation hepatitis. We aim to test a new parallel architecture model of normal tissue complication probability (NTCP) with these data. Methods and Materials: Complication data and dose volume histograms from a total of 93 patients with normal liver function, treated on a prospective protocol with 3D conformal radiation therapy and intraarterial hepatic fluorodeoxyuridine, were analyzed with a new parallel architecture model. Patient treatment fell into six categories differing in doses delivered and volumes irradiated. By modeling the radiosensitivity of liver subunits, we are able to use dose volume histograms to calculate the fraction of the liver damaged in each patient. A complication results if this fraction exceeds the patient's functional reserve. To determine the patient distribution of functional reserves and the subunit radiosensitivity, the maximum likelihood method was used to fit the observed complication data. Results: The parallel model fit the complication data well, although uncertainties on the functional reserve distribution and subunit radiosensitivy are highly correlated. Conclusion: The observed radiation hepatitis complications show a threshold effect that can be described well with a parallel architecture model. However, additional independent studies are required to better determine the parameters defining the functional reserve distribution and subunit radiosensitivity

  17. Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

    International Nuclear Information System (INIS)

    Hicks, D.L.

    1983-11-01

    In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references

  18. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    Science.gov (United States)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking

  19. An FPGA-Based Quantum Computing Emulation Framework Based on Serial-Parallel Architecture

    Directory of Open Access Journals (Sweden)

    Y. H. Lee

    2016-01-01

    Full Text Available Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speed-up than software simulations. In this paper, an efficient hardware emulation method that employs a serial-parallel hardware architecture targeted for field programmable gate array (FPGA is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. The proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits.

  20. Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

    Science.gov (United States)

    Lee, J.; Kim, K.

    1991-01-01

    A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.

  1. Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

    Science.gov (United States)

    Lee, J.; Kim, K.

    A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.

  2. A Pipelined and Parallel Architecture for Quantum Monte Carlo Simulations on FPGAs

    Directory of Open Access Journals (Sweden)

    Akila Gothandaraman

    2010-01-01

    Full Text Available Recent advances in Field-Programmable Gate Array (FPGA technology make reconfigurable computing using FPGAs an attractive platform for accelerating scientific applications. We develop a deeply pipelined and parallel architecture for Quantum Monte Carlo simulations using FPGAs. Quantum Monte Carlo simulations enable us to obtain the structural and energetic properties of atomic clusters. We experiment with different pipeline structures for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of achievable clock rate, while at the same time has a modest use of the FPGA resources. We discuss the details of the pipelined and generic architecture that is used to obtain the potential energy and wave function of a cluster of atoms.

  3. Design and simulation of parallel and distributed architectures for images processing

    International Nuclear Information System (INIS)

    Pirson, Alain

    1990-01-01

    The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr

  4. Optimal task mapping in safety-critical real-time parallel systems; Placement optimal de taches pour les systemes paralleles temps-reel critiques

    Energy Technology Data Exchange (ETDEWEB)

    Aussagues, Ch

    1998-12-11

    This PhD thesis is dealing with the correct design of safety-critical real-time parallel systems. Such systems constitutes a fundamental part of high-performance systems for command and control that can be found in the nuclear domain or more generally in parallel embedded systems. The verification of their temporal correctness is the core of this thesis. our contribution is mainly in the following three points: the analysis and extension of a programming model for such real-time parallel systems; the proposal of an original method based on a new operator of synchronized product of state machines task-graphs; the validation of the approach by its implementation and evaluation. The work addresses particularly the main problem of optimal task mapping on a parallel architecture, such that the temporal constraints are globally guaranteed, i.e. the timeliness property is valid. The results incorporate also optimally criteria for the sizing and correct dimensioning of a parallel system, for instance in the number of processing elements. These criteria are connected with operational constraints of the application domain. Our approach is based on the off-line analysis of the feasibility of the deadline-driven dynamic scheduling that is used to schedule tasks inside one processor. This leads us to define the synchronized-product, a system of linear, constraints is automatically generated and then allows to calculate a maximum load of a group of tasks and then to verify their timeliness constraints. The communications, their timeliness verification and incorporation to the mapping problem is the second main contribution of this thesis. FInally, the global solving technique dealing with both task and communication aspects has been implemented and evaluated in the framework of the OASIS project in the LETI research center at the CEA/Saclay. (author) 96 refs.

  5. Design of a real-time wind turbine simulator using a custom parallel architecture

    Science.gov (United States)

    Hoffman, John A.; Gluck, R.; Sridhar, S.

    1995-01-01

    The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

  6. Real-time objects development: Study and proposal for a parallel scheduling architecture

    International Nuclear Information System (INIS)

    Rioux, Laurent

    1997-01-01

    This thesis contributes to the programming and the execution control of real-time object oriented applications. Using real-time objects is very interesting for programming real- time applications, because this model can introduce the concurrence with the encapsulation properties, with modularity and reusability by taking into account the real-time constraints of the application. One essential quality of this approach is that it can directly specify the parallelism and the real-time constraints at the model level of the application. An annotation system of C++ has been defined to describe the real-time specifications in the model (or in the source code) of the application. It will supply to the execution support the different information it needs for the control. In this approach of multitasking, the control is distributed and encapsulated inside each real time object. Three complementary levels of control have been defined: the state level (defining the capability of an object to treat an operation), the concurrence level (assuring the coherence between the object attributes) and a scheduling control (allocating the processors resources to the object by taking real-time constraints into account). The proposed control architecture, named OROS, manages the attribute access of each object in an individual way, then it can parallel treatments which do not access at the same data. This architecture makes a dynamic control of an application that can take benefit from the parallelism of the new machines both for the execution parallelism and the control itself. This architecture uses only the simplest primitives of the industrial real-time operating systems which ensures its feasibility and portability. (author) [fr

  7. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  8. Parallel eigenanalysis of finite element models in a completely connected architecture

    Science.gov (United States)

    Akl, F. A.; Morel, M. R.

    1989-01-01

    A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.

  9. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  10. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    International Nuclear Information System (INIS)

    Andrade, Xavier; Aspuru-Guzik, Alán; Alberdi-Rodriguez, Joseba; Rubio, Angel; Strubbe, David A; Louie, Steven G; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Marques, Miguel A L

    2012-01-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. (topical review)

  11. GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures.

    Directory of Open Access Journals (Sweden)

    Rosalba Giugno

    Full Text Available Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP, offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i do not fully exploit available parallel computing power and (ii they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.

  12. Reply to "Comments on Techniques and Architectures for Hazard-Free Semi-Parallel Decoding of LDPC Codes"

    Directory of Open Access Journals (Sweden)

    Rovini Massimo

    2009-01-01

    Full Text Available This is a reply to the comments by Gunnam et al. "Comments on 'Techniques and architectures for hazard-free semi-parallel decoding of LDPC codes'", EURASIP Journal on Embedded Systems, vol. 2009, Article ID 704174 on our recent work "Techniques and architectures for hazard-free semi-parallel decoding of LDPC codes", EURASIP Journal on Embedded Systems, vol. 2009, Article ID 723465.

  13. A Case Study of a Hybrid Parallel 3D Surface Rendering Graphics Architecture

    DEFF Research Database (Denmark)

    Holten-Lund, Hans Erik; Madsen, Jan; Pedersen, Steen

    1997-01-01

    This paper presents a case study in the design strategy used inbuilding a graphics computer, for drawing very complex 3Dgeometric surfaces. The goal is to build a PC based computer systemcapable of handling surfaces built from about 2 million triangles, andto be able to render a perspective view...... of these on a computer displayat interactive frame rates, i.e. processing around 50 milliontriangles per second. The paper presents a hardware/softwarearchitecture called HPGA (Hybrid Parallel Graphics Architecture) whichis likely to be able to carry out this task. The case study focuses ontechniques to increase...

  14. Modeling, realization and evaluation of a parallel architecture for the data acquisition in multidetectors

    International Nuclear Information System (INIS)

    Guirande, Ph.; Aleonard, M-M.; Dien, Q-T.; Pedroza, J-L.

    1997-01-01

    The efficiency increasing in four π (EUROGAM, EUROBALL, DIAMANT) is achieved by an increase in the granularity, hence in the event counting rate in the acquisition system. Consequently, an evolution of the architecture of readout systems, coding and software is necessary. To achieve the required evaluation we have implemented a parallel architecture to check the quality of the events. The first application of this architecture was to make available an improved data acquisition system for the DIAMANT multidetector. The data acquisition system of DIAMANT is based on an ensemble of VME cards which must manage: the event readout, their salvation on magnetic support and histogram construction. The ensemble consists of processors distributed in a net, a workstation to control the experiment and a display system for spectra and arrays. In such architecture the task of VME bus becomes quickly a limitation for performances not only for the data transfer but also for coordination of different processors. The parallel architecture used makes the VME bus operation easy. It is based on three DSP C40 (Digital Signal Processor) implanted in a commercial (LSI) VME. It is provided with an external bus used to read the raw data from an interface card (ROCVI) between the 32 bit ECL bus reading the real time VME-based encoders. The performed tests have evidenced jamming after data exchanges between the processors using two communication lines. The analysis of this problem has indicated the necessity of dynamical changes of tasks to avoid this blocking. Intrinsic evaluation (i.e. without transfer on the VME bus) has been carried out for two parallel topologies (processor farm and tree). The simulation software permitted the generation of event packets. The obtained rates are sensibly equivalent (6 Mo/s) independent of topology. The farm topology has been chosen because it is simple to implant. The charge evaluation has reduced the rate in 'simplex' communication mode to 5.3 Mo/s and

  15. Lamb wave propagation modelling and simulation using parallel processing architecture and graphical cards

    International Nuclear Information System (INIS)

    Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K

    2012-01-01

    This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)

  16. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John; Lee, Jon; Margulies, Susan

    2010-01-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  17. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John

    2010-06-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  18. Eigensolution of finite element problems in a completely connected parallel architecture

    Science.gov (United States)

    Akl, Fred A.; Morel, Michael R.

    1989-01-01

    A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.

  19. Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

    Energy Technology Data Exchange (ETDEWEB)

    Amadio, G.; et al.

    2017-11-22

    An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.

  20. Hardware system of parallel processing for fast CT image reconstruction based on circular shifting float memory architecture

    International Nuclear Information System (INIS)

    Wang Shi; Kang Kejun; Wang Jingjin

    1995-01-01

    Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture and hardware design of PIRS-4 (4-processor Parallel Image Reconstruction System) which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes structure and component of the system, the design of cross bar switch and details of control model. The test results are described

  1. Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

    Directory of Open Access Journals (Sweden)

    Maciej Cytowski

    2017-01-01

    Full Text Available Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size. With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.

  2. Techniques and Architectures for Hazard-Free Semi-Parallel Decoding of LDPC Codes

    Directory of Open Access Journals (Sweden)

    Rovini Massimo

    2009-01-01

    Full Text Available The layered decoding algorithm has recently been proposed as an efficient means for the decoding of low-density parity-check (LDPC codes, thanks to the remarkable improvement in the convergence speed (2x of the decoding process. However, pipelined semi-parallel decoders suffer from violations or "hazards" between consecutive updates, which not only violate the layered principle but also enforce the loops in the code, thus spoiling the error correction performance. This paper describes three different techniques to properly reschedule the decoding updates, based on the careful insertion of "idle" cycles, to prevent the hazards of the pipeline mechanism. Also, different semi-parallel architectures of a layered LDPC decoder suitable for use with such techniques are analyzed. Then, taking the LDPC codes for the wireless local area network (IEEE 802.11n as a case study, a detailed analysis of the performance attained with the proposed techniques and architectures is reported, and results of the logic synthesis on a 65 nm low-power CMOS technology are shown.

  3. A study of objective functions for organs with parallel and serial architecture

    International Nuclear Information System (INIS)

    Stavrev, P.V.; Stavreva, N.A.; Round, W.H.

    1997-01-01

    An objective function analysis when target volumes are deliberately enlarged to account for tumour mobility and consecutive uncertainty in the tumour position in external beam radiotherapy has been carried out. The dose distribution inside the tumour is assumed to have logarithmic dependence on the tumour cell density which assures an iso-local tumour control probability. The normal tissue immediately surrounding the tumour is irradiated homogeneously at a dose level equal to the dose D(R)) delivered at the edge of the tumour The normal tissue in the high dose field is modelled as being organized in identical functional subunits (FSUs) composed of a relatively large number of cells. Two types of organs - having serial and parallel architecture are considered. Implicit averaging over intrapatient normal tissue radiosensitivity variations is done. A function describing the normal tissue survival probability S 0 is constructed. The objective function is given as a product of the total tumour control probability (TCP) and the normal tissue survival probability S 0 . The values of the dose D(R)) which result in a maximum of the objective function are obtained for different combinations of tumour and normal tissue parameters, such as tumour and normal tissue radiosensitivities, number of cells constituting a normal tissue functional unit, total number of normal cells under high dose (D(R)) exposure and functional reserve for organs having parallel architecture. The corresponding TCP and S 0 values are computed and discussed. (authors)

  4. Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

    International Nuclear Information System (INIS)

    Collette, Thierry

    1992-01-01

    Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr

  5. Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U., Phys. Dept.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U., Phys. Dept.; Riley, Daniel [Cornell U., Phys. Dept.; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U., Phys. Dept.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

    2017-11-16

    Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs, and similar technologies. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.

  6. Evaluating the performance of the particle finite element method in parallel architectures

    Science.gov (United States)

    Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.

    2014-05-01

    This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.

  7. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    Science.gov (United States)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  8. HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

    Science.gov (United States)

    Sterling, Thomas; Bergman, Larry

    2000-01-01

    Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention

  9. Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

    Science.gov (United States)

    Liu, Kuojuey Ray

    1990-01-01

    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.

  10. Decreasing Data Analytics Time: Hybrid Architecture MapReduce-Massive Parallel Processing for a Smart Grid

    Directory of Open Access Journals (Sweden)

    Abdeslam Mehenni

    2017-03-01

    Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.

  11. NETRA: A parallel architecture for integrated vision systems 2: Algorithms and performance evaluation

    Science.gov (United States)

    Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

    1989-01-01

    In part 1 architecture of NETRA is presented. A performance evaluation of NETRA using several common vision algorithms is also presented. Performance of algorithms when they are mapped on one cluster is described. It is shown that SIMD, MIMD, and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. For some algorithms, analytical performance results are compared with implementation performance results. It is observed that the analysis is very accurate. Performance analysis of parallel algorithms when mapped across clusters is presented. Mappings across clusters illustrate the importance and use of shared as well as distributed memory in achieving high performance. The parameters for evaluation are derived from the characteristics of the parallel algorithms, and these parameters are used to evaluate the alternative communication strategies in NETRA. Furthermore, the effect of communication interference from other processors in the system on the execution of an algorithm is studied. Using the analysis, performance of many algorithms with different characteristics is presented. It is observed that if communication speeds are matched with the computation speeds, good speedups are possible when algorithms are mapped across clusters.

  12. Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction

    KAUST Repository

    Ltaief, Hatem

    2012-01-01

    The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system. © 2012 Springer-Verlag.

  13. Architecture of top down, parallel pattern recognition system TOPS and its application to the MR head images

    International Nuclear Information System (INIS)

    Matsunoshita, Jun-ichi; Akamatsu, Shigeo; Yamamoto, Shinji.

    1993-01-01

    This paper describes about the system architecture of a new image recognition system TOPS (top-down parallel pattern recognition system), and its application to the automatic extraction of brain organs (cerebrum, cerebellum, brain stem) from 3D-MRI images. Main concepts of TOPS are as follows: (1) TOPS is the top-down type recognition system, which allows parallel models in each level of hierarchy structure. (2) TOPS allows parallel image processing algorithms for one purpose (for example, for extraction of one special organ). This results in multiple candidates for one purpose, and judgment to get unique solution for it will be made at upper level of hierarchy structure. (author)

  14. A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

    Science.gov (United States)

    Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

    2017-11-01

    , which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ

  15. Parallel point-multiplication architecture using combined group operations for high-speed cryptographic applications.

    Directory of Open Access Journals (Sweden)

    Md Selim Hossain

    Full Text Available In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM, which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST. The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance ([Formula: see text] and Area × Time × Energy (ATE product of the proposed design are far better than the most significant studies found in the literature.

  16. Benefits of a parallel hybrid electric architecture on medium commercial vehicles

    Energy Technology Data Exchange (ETDEWEB)

    Boot, Marco Aimo; Consano, Ludovico [Iveco S.p.A, Turin (Italy)

    2009-07-01

    Hybrid electric technology is becoming an increasingly interesting solution for medium and heavy trucks involved in urban and suburban missions. The increasing demand for gas and oil, consequent price rises and environmental concerns are driving a market that is in need of alternative solutions. For these reasons, the growth in the global hybrid market significantly exceeded all the hybrid sales forecasts. The parallel hybrid electric vehicle (PHEV) employs an additional power source (electric motogenerator) in combination with the conventional diesel engine. This architecture exploits the benefits of both power sources in order to reduce the fuel consumption, increase the overall power, and above all, decrease CO2 emissions. Moreover, the emissions reduction target is lead by EU Regulations and local initiatives for traffic limitations, but the real drivers for the growth in the market are demonstrable fuel economy improvements and productivity costs optimization (global efficiency). This paper presents the results achieved by Iveco in the development and testing of parallel hybrid systems applied to medium range commercial vehicles, with the intent to evaluate the functionality, driveability performance and leading the best reduction in terms of fuel consumption and emissions in different real-world missions. The system architecture foresees one electric motor/generator and a single clutch unit. An external electrical power source for the battery recharging it is not necessary. The chosen configuration allows to implement the following functional modes: Stop and Start with Electric Launch, Hybrid Mode, Regenerative Braking Mode, Inertial Start and Creeping Mode. The software contained in the supervisor control unit has been tuned to the customer specific missions, taking in account on road data acquisition in order to demonstrate the reliability, driveability and the overall efficiency of the hybrid system. The field tests carried out in collaboration with

  17. The parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture

    International Nuclear Information System (INIS)

    Wang Shi; Kang Kejun; Wang Jingjin

    1996-01-01

    Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture, hardware and software design of PIRS-4 (4-processor Parallel Image Reconstruction System), which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes the structure and components of the system, the design of crossbar switch and details of control model, the description of RPBP image reconstruction, the choice of OS (Operate System) and language, the principle of imitating EMS, direct memory R/W of float and programming in the protect model. Finally, the test results are given

  18. BLAST in Gid (BiG): A Grid-Enabled Software Architecture and Implementation of Parallel and Sequential BLAST

    International Nuclear Information System (INIS)

    Aparicio, G.; Blanquer, I.; Hernandez, V.; Segrelles, D.

    2007-01-01

    The integration of High-performance computing tools is a key issue in biomedical research. Many computer-based applications have been migrated to High-Performance computers to deal with their computing and storage needs such as BLAST. However, the use of clusters and computing farm presents problems in scalability. The use of a higher layer of parallelism that splits the task into highly independent long jobs that can be executed in parallel can improve the performance maintaining the efficiency. Grid technologies combined with parallel computing resources are an important enabling technology. This work presents a software architecture for executing BLAST in a International Grid Infrastructure that guarantees security, scalability and fault tolerance. The software architecture is modular an adaptable to many other high-throughput applications, both inside the field of bio computing and outside. (Author)

  19. Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shan, Hongzhang [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-01-01

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.

  20. Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shan, Hongzhang; Williams, Samuel; Jong, Wibe de; Oliker, Leonid

    2014-10-10

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.

  1. Nouvelle architecture électromagnétique à réluctance variable excitée pour accumulateur électromécanique d'énergie

    OpenAIRE

    Gergaud , Olivier; BEN AHMED , Hamid; Multon , Bernard; Bernard , Nicolas

    2001-01-01

    International audience; Cet article traite d'un moteur-générateur original destiné au stockage électromécanique d'énergie. Il est de type synchrone à excitation homopolaire fixe, à bobinage d'induit dans l'entrefer et d'architecture discoïde pour une meilleure intégration au volant d'inertie. Il ne doit pas perturber les suspensions magnétiques du dispositif, pour cela nous avons mené une étude détaillée des efforts magnétiques, notamment parasites, générés dans cette structure. Une méthode p...

  2. Molecular architecture of human prion protein amyloid: a parallel, in-register beta-structure.

    Science.gov (United States)

    Cobb, Nathan J; Sönnichsen, Frank D; McHaourab, Hassane; Surewicz, Witold K

    2007-11-27

    Transmissible spongiform encephalopathies (TSEs) represent a group of fatal neurodegenerative diseases that are associated with conformational conversion of the normally monomeric and alpha-helical prion protein, PrP(C), to the beta-sheet-rich PrP(Sc). This latter conformer is believed to constitute the main component of the infectious TSE agent. In contrast to high-resolution data for the PrP(C) monomer, structures of the pathogenic PrP(Sc) or synthetic PrP(Sc)-like aggregates remain elusive. Here we have used site-directed spin labeling and EPR spectroscopy to probe the molecular architecture of the recombinant PrP amyloid, a misfolded form recently reported to induce transmissible disease in mice overexpressing an N-terminally truncated form of PrP(C). Our data show that, in contrast to earlier, largely theoretical models, the con formational conversion of PrP(C) involves major refolding of the C-terminal alpha-helical region. The core of the amyloid maps to C-terminal residues from approximately 160-220, and these residues form single-molecule layers that stack on top of one another with parallel, in-register alignment of beta-strands. This structural insight has important implications for understanding the molecular basis of prion propagation, as well as hereditary prion diseases, most of which are associated with point mutations in the region found to undergo a refolding to beta-structure.

  3. FPGAs and parallel architectures for aerospace applications soft errors and fault-tolerant design

    CERN Document Server

    Rech, Paolo

    2016-01-01

    This book introduces the concepts of soft errors in FPGAs, as well as the motivation for using commercial, off-the-shelf (COTS) FPGAs in mission-critical and remote applications, such as aerospace.  The authors describe the effects of radiation in FPGAs, present a large set of soft-error mitigation techniques that can be applied in these circuits, as well as methods for qualifying these circuits under radiation.  Coverage includes radiation effects in FPGAs, fault-tolerant techniques for FPGAs, use of COTS FPGAs in aerospace applications, experimental data of FPGAs under radiation, FPGA embedded processors under radiation, and fault injection in FPGAs. Since dedicated parallel processing architectures such as GPUs have become more desirable in aerospace applications due to high computational power, GPU analysis under radiation is also discussed. ·         Discusses features and drawbacks of reconfigurability methods for FPGAs, focused on aerospace applications; ·         Explains how radia...

  4. Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

    Science.gov (United States)

    Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

    2017-12-01

    As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.

  5. Development of imaging and reconstructions algorithms on parallel processing architectures for applications in non-destructive testing

    International Nuclear Information System (INIS)

    Pedron, Antoine

    2013-01-01

    This thesis work is placed between the scientific domain of ultrasound non-destructive testing and algorithm-architecture adequation. Ultrasound non-destructive testing includes a group of analysis techniques used in science and industry to evaluate the properties of a material, component, or system without causing damage. In order to characterise possible defects, determining their position, size and shape, imaging and reconstruction tools have been developed at CEA-LIST, within the CIVA software platform. Evolution of acquisition sensors implies a continuous growth of datasets and consequently more and more computing power is needed to maintain interactive reconstructions. General purpose processors (GPP) evolving towards parallelism and emerging architectures such as GPU allow large acceleration possibilities than can be applied to these algorithms. The main goal of the thesis is to evaluate the acceleration than can be obtained for two reconstruction algorithms on these architectures. These two algorithms differ in their parallelization scheme. The first one can be properly parallelized on GPP whereas on GPU, an intensive use of atomic instructions is required. Within the second algorithm, parallelism is easier to express, but loop ordering on GPP, as well as thread scheduling and a good use of shared memory on GPU are necessary in order to obtain efficient results. Different API or libraries, such as OpenMP, CUDA and OpenCL are evaluated through chosen benchmarks. An integration of both algorithms in the CIVA software platform is proposed and different issues related to code maintenance and durability are discussed. (author) [fr

  6. Exploring Hardware-Based Primitives to Enhance Parallel Security Monitoring in a Novel Computing Architecture

    National Research Council Canada - National Science Library

    Mott, Stephen

    2007-01-01

    .... In doing this, we propose a novel computing architecture, derived from a contemporary shared memory architecture, that facilitates efficient security-related monitoring in real-time, while keeping...

  7. A Real-Time Early Cognitive Vision System based on a Hybrid coarse and fine grained Parallel Architecture

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With

    . The current top model GPUs from NVIDIA possess up to 240 homogeneous cores. In the past, GPUs have beenhard to program, forcing the programmer to map the algorithm to the graphics processing pipeline and think in terms of vertex and fragment shaders, imposing a limiting factor in the implementation of non......-graphics applications. This, however, has changed with the introduction of the Compute Unified Device Architecture (CUDA) framework from NVIDIA. The EV and ECV stages have different parallel properties. The regular, pixel-based processing of EV fit the GPU architecture very well, and parts of ECV, on the other hand...

  8. Parallel computing for homogeneous diffusion and transport equations in neutronics; Calcul parallele pour les equations de diffusion et de transport homogenes en neutronique

    Energy Technology Data Exchange (ETDEWEB)

    Pinchedez, K

    1999-06-01

    Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)

  9. The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

    International Nuclear Information System (INIS)

    McGhee, J.M.; Roberts, R.M.; Morel, J.E.

    1997-01-01

    A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated

  10. Comments on “Techniques and Architectures for Hazard-Free Semi-Parallel Decoding of LDPC Codes”

    Directory of Open Access Journals (Sweden)

    Mark B. Yeary

    2009-01-01

    Full Text Available This is a comment article on the publication “Techniques and Architectures for Hazard-Free Semi-Parallel Decoding of LDPC Codes” Rovini et al. (2009. We mention that there has been similar work reported in the literature before, and the previous work has not been cited correctly, for example Gunnam et al. (2006, 2007. This brief note serves to clarify these issues.

  11. Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

    Science.gov (United States)

    Laura, Jason R.; Rey, Sergio J.

    2017-01-01

    Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.

  12. INVESTIGATION OF FLIP-FLOP PERFORMANCE ON DIFFERENT TYPE AND ARCHITECTURE IN SHIFT REGISTER WITH PARALLEL LOAD APPLICATIONS

    Directory of Open Access Journals (Sweden)

    Dwi Purnomo

    2015-08-01

    Full Text Available Register is one of the computer components that have a key role in computer organisation. Every computer contains millions of registers that are manifested by flip-flop. This research focuses on the investigation of flip-flop performance based on its type (D, T, S-R, and J-K and architecture (structural, behavioural, and hybrid. Each type of flip-flop on each architecture would be tested in different bit of shift register with parallel load applications. The experiment criteria that will be assessed are power consumption, resources required, memory required, latency, and efficiency. Based on the experiment, it could be shown that D flip-flop and hybrid architecture showed the best performance in required memory, latency, power consumption, and efficiency. In addition, the experiment results showed that the greater the register number, the less efficient the system would be.

  13. Truly nested data-parallelism: compiling SaC for the Microgrid architecture

    NARCIS (Netherlands)

    Herhut, S.; Joslin, C.; Scholz, S.-B.; Grelck, C.; Morazan, M.

    2009-01-01

    Data-parallel programming facilitates elegant specification of concurrency. However, the composability of data-parallel operations so far has been constrained by the requirement to have only at data- parallel operation at runtime. In this paper, we present early results on our work to exploit

  14. A Parallel Algorithm for Connected Component Labelling of Gray-scale Images on Homogeneous Multicore Architectures

    International Nuclear Information System (INIS)

    Niknam, Mehdi; Thulasiraman, Parimala; Camorlinga, Sergio

    2010-01-01

    Connected component labelling is an essential step in image processing. We provide a parallel version of Suzuki's sequential connected component algorithm in order to speed up the labelling process. Also, we modify the algorithm to enable labelling gray-scale images. Due to the data dependencies in the algorithm we used a method similar to pipeline to exploit parallelism. The parallel algorithm method achieved a speedup of 2.5 for image size of 256 x 256 pixels using 4 processing threads.

  15. Migrating to a real-time distributed parallel simulator architecture- An update

    CSIR Research Space (South Africa)

    Duvenhage, B

    2007-09-01

    Full Text Available A legacy non-distributed logical time simulator was previously migrated to a distributed architecture to parallelise execution. The existing Discrete Time System Specification (DTSS) modelling formalism was retained to simplify the reuse of existing...

  16. High-Performance Control of Paralleled Three-Phase Inverters for Residential Microgrid Architectures Based on Online Uninterruptable Power Systems

    DEFF Research Database (Denmark)

    Zhang, Chi; Guerrero, Josep M.; Vasquez, Juan Carlos

    2015-01-01

    In this paper, a control strategy for the parallel operation of three-phase inverters forming an online uninterruptible power system (UPS) is presented. The UPS system consists of a cluster of paralleled inverters with LC filters directly connected to an AC critical bus and an AC/DC forming a DC...... bus. The proposed control scheme is performed on two layers: (i) a local layer that contains a “reactive power vs phase” in order to synchronize the phase angle of each inverter and a virtual resistance loop that guarantees equal power sharing among inverters; (ii) a central controller that guarantees...... synchronization with an external real/fictitious utility, and critical bus voltage restoration. Constant transient and steady-state frequency, active, reactive and harmonic power sharing, and global phase-locked loop resynchronization capability are achieved. Detailed system topology and control architecture...

  17. A Parallel Implementation of a Smoothed Particle Hydrodynamics Method on Graphics Hardware Using the Compute Unified Device Architecture

    International Nuclear Information System (INIS)

    Wong Unhong; Wong Honcheng; Tang Zesheng

    2010-01-01

    The smoothed particle hydrodynamics (SPH), which is a class of meshfree particle methods (MPMs), has a wide range of applications from micro-scale to macro-scale as well as from discrete systems to continuum systems. Graphics hardware, originally designed for computer graphics, now provide unprecedented computational power for scientific computation. Particle system needs a huge amount of computations in physical simulation. In this paper, an efficient parallel implementation of a SPH method on graphics hardware using the Compute Unified Device Architecture is developed for fluid simulation. Comparing to the corresponding CPU implementation, our experimental results show that the new approach allows significant speedups of fluid simulation through handling huge amount of computations in parallel on graphics hardware.

  18. Parallel-hierarchical processing and classification of laser beam profile images based on the GPU-oriented architecture

    Science.gov (United States)

    Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan

    2017-08-01

    The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.

  19. Une approche de coloriage d’arrêtes pour la conception d’architectures parallèles d’entrelaceurs matériels

    OpenAIRE

    Awais Hussein , Sani

    2012-01-01

    Nowadays, Turbo and LDPC codes are two families of codes that are extensively used in current communication standards due to their excellent error correction capabilities. However, hardware design of coders and decoders for high data rate applications is not a straightforward process. For high data rates, decoders are implemented on parallel architectures in which more than one processing elements decode the received data. To achieve high memory bandwidth, the main memory is divided into smal...

  20. An Approach Using Parallel Architecture to Storage DICOM Images in Distributed File System

    International Nuclear Information System (INIS)

    Soares, Tiago S; Prado, Thiago C; Dantas, M A R; De Macedo, Douglas D J; Bauer, Michael A

    2012-01-01

    Telemedicine is a very important area in medical field that is expanding daily motivated by many researchers interested in improving medical applications. In Brazil was started in 2005, in the State of Santa Catarina has a developed server called the CyclopsDCMServer, which the purpose to embrace the HDF for the manipulation of medical images (DICOM) using a distributed file system. Since then, many researches were initiated in order to seek better performance. Our approach for this server represents an additional parallel implementation in I/O operations since HDF version 5 has an essential feature for our work which supports parallel I/O, based upon the MPI paradigm. Early experiments using four parallel nodes, provide good performance when compare to the serial HDF implemented in the CyclopsDCMServer.

  1. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  2. A 0.13-µm implementation of 5 Gb/s and 3-mW folded parallel architecture for AES algorithm

    Science.gov (United States)

    Rahimunnisa, K.; Karthigaikumar, P.; Kirubavathy, J.; Jayakumar, J.; Kumar, S. Suresh

    2014-02-01

    A new architecture for encrypting and decrypting the confidential data using Advanced Encryption Standard algorithm is presented in this article. This structure combines the folded structure with parallel architecture to increase the throughput. The whole architecture achieved high throughput with less power. The proposed architecture is implemented in 0.13-µm Complementary metal-oxide-semiconductor (CMOS) technology. The proposed structure is compared with different existing structures, and from the result it is proved that the proposed structure gives higher throughput and less power compared to existing works.

  3. Resource optimised reconfigurable modular parallel pipelined stochastic approximation-based self-tuning regulator architecture with reduced latency

    Directory of Open Access Journals (Sweden)

    Varghese Mathew Vaidyan

    2015-09-01

    Full Text Available Present self-tuning regulator architectures based on recursive least-square estimation are computationally expensive and require large amount of resources and time in generating the first control signal due to computational bottlenecks imposed by the calculations involved in estimation stage, different stages of matrix multiplications and the number of intermediate variables at each iteration and precludes its use in applications that have fast required response times and those which run on embedded computing platforms with low-power or low-cost requirements with constraints on resource usage. A salient feature of this study is that a new modular parallel pipelined stochastic approximation-based self-tuning regulator architecture which reduces the time required to generate the first control signal, reduces resource usage and reduces the number of intermediate variables is proposed. Fast matrix multiplication, pipelining and high-speed arithmetic function implementations were used for improving the performance. Results of implementation demonstrate that the proposed architecture has an improvement in control signal generation time by 38% and reduction in resource usage by 41% in terms of multipliers and 44.4% in terms of adders compared with the best existing related work, opening up new possibilities for the application of online embedded self-tuning regulators.

  4. Architecture

    OpenAIRE

    Clear, Nic

    2014-01-01

    When discussing science fiction’s relationship with architecture, the usual practice is to look at the architecture “in” science fiction—in particular, the architecture in SF films (see Kuhn 75-143) since the spaces of literary SF present obvious difficulties as they have to be imagined. In this essay, that relationship will be reversed: I will instead discuss science fiction “in” architecture, mapping out a number of architectural movements and projects that can be viewed explicitly as scien...

  5. AutoCAD 3D pour l'architecture et le design : conception d'une maison et de son mobilier

    CERN Document Server

    Riccio, Michel

    2010-01-01

    Module 3D d'AutoCAD, logiciel leader de dessin assisté par ordinateur, AutoCAD 3D est l'outil indispensable des architectes qui souhaitent présenter leurs travaux en trois dimensions. Voici donc un livre très pédagogique qui leur fera découvrir ses principales fonctionnalités (versions 2006 à 2010) à travers un projet de conception de maison contemporaine. Riche de 700 plans, schémas et dessins, il explique quels outils employer pour modéliser les façades d'une villa, sa piscine et sa terrasse, ainsi que son architecture intérieure et son mobilier. Dès les premières pages, le lecteur se retrouve ainsi plongé dans la pratique, voyant se construire au fil des 17 ateliers un édifice tridimensionnel complexe, qu'il aura la satisfaction d'avoir créé lui-même. A qui s'adresse ce livre ? Aux bureaux d'architectes qui souhaitent présenter leurs projets en 3D ; À tous les étudiants en écoles d'architecture ; Aux utilisateurs 2D d'AutoCAD qui désirent connaître les fonctions 3D de ce logiciel.

  6. Design of a Simple and Modular 2-DOF Ankle Physiotherapy Device Relying on a Hybrid Serial-Parallel Robotic Architecture

    Directory of Open Access Journals (Sweden)

    Christos E. Syrseloudis

    2011-01-01

    Full Text Available The aim of this work is to propose a new 2-DOF robotic platform with hybrid parallel-serial structure and to undertake its parametric design so that it can follow the whole range of ankle related foot movements. This robot can serve as a human ankle rehabilitation device. The existing ankle rehabilitation devices present typically one or more of the following shortcomings: redundancy, large size, or high cost, hence the need for a device that could offer simplicity, modularity, and low cost of construction and maintenance. In addition, our targeted device must be safe during operation, disallow undesirable movements of the foot, while adaptable to any human foot. Our detailed study of foot kinematics has led us to a new hybrid architecture, which strikes a balance among all aforementioned goals. It consists of a passive serial kinematics chain with two adjustable screws so that the axes of the chain match the two main ankle-axes of typical feet. An active parallel chain, which consists of two prismatic actuators, provides the movement of the platform. Thus, the platform can follow the foot movements, thanks to the passive chain, and also possesses the advantages of parallel robots, including rigidity, high stiffness and force capabilities. The lack of redundancy yields a simpler device with lower size and cost. The paper describes the kinematics modelling of the platform and analyses the force and velocity transmission. The parametric design of the platform is carried out; our simulations confirm the platform's suitability for ankle rehabilitation.

  7. Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, S.K.

    1987-01-01

    A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies

  8. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    Science.gov (United States)

    Zerr, Robert Joseph

    2011-12-01

    The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of

  9. Real-time hypothesis driven feature extraction on parallel processing architectures

    DEFF Research Database (Denmark)

    Granmo, O.-C.; Jensen, Finn Verner

    2002-01-01

    the problem of higher-order feature-content/feature-feature correlation, causally complexly interacting features are identified through Bayesian network d-separation analysis and combined into joint features. When used on a moderately complex object-tracking case, the technique is able to select...... extraction, which selectively extract relevant features one-by-one, have in some cases achieved real-time performance on single processing element architectures. In this paperwe propose a novel technique which combines the above two approaches. Features are selectively extracted in parallelizable sets...

  10. Analysis of IDR(s Family of Solvers for Reservoir Simulations on Different Parallel Architectures

    Directory of Open Access Journals (Sweden)

    Seignole Vincent

    2016-09-01

    Full Text Available The present contribution consists in providing a detailed analysis of several realizations of the IDR(s family of solvers, under different facets: robustness, performance and implementation on different parallel environments in regards of sequential IDR(s resolution implementation tested through several industrial geologically and structurally coherent 3D-field case reservoir models. This work is the result of continuous efforts towards time-response improvement of Storengy’s reservoir three-dimensional simulator named Multi, dedicated to gas-storage applications.

  11. The genetic architecture of parallel armor plate reduction in threespine sticklebacks.

    Directory of Open Access Journals (Sweden)

    Pamela F Colosimo

    2004-05-01

    Full Text Available How many genetic changes control the evolution of new traits in natural populations? Are the same genetic changes seen in cases of parallel evolution? Despite long-standing interest in these questions, they have been difficult to address, particularly in vertebrates. We have analyzed the genetic basis of natural variation in three different aspects of the skeletal armor of threespine sticklebacks (Gasterosteus aculeatus: the pattern, number, and size of the bony lateral plates. A few chromosomal regions can account for variation in all three aspects of the lateral plates, with one major locus contributing to most of the variation in lateral plate pattern and number. Genetic mapping and allelic complementation experiments show that the same major locus is responsible for the parallel evolution of armor plate reduction in two widely separated populations. These results suggest that a small number of genetic changes can produce major skeletal alterations in natural populations and that the same major locus is used repeatedly when similar traits evolve in different locations.

  12. From variability tolerance to approximate computing in parallel integrated architectures and accelerators

    CERN Document Server

    Rahimi, Abbas; Gupta, Rajesh K

    2017-01-01

    This book focuses on computing devices and their design at various levels to combat variability. The authors provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. They discuss methods to predict and prevent, detect and correct, and finally conditions under which such errors can be accepted; they also consider their implications on cost, performance and quality. Coverage includes a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. These can be combined in various ways to achieve specific goals related to observability and controllability of the variability effects, providing means to achieve cross layer or hybrid resilience. · Covers challenges and opportunities in identifying microelectronic variability and the resulting errors at various layers in the system abstraction; · Enables readers to assess how various levels of circuit and system design can mitigate t...

  13. Une architecture de sélection de l'action pour des humains virtuels autonomes dans des mondes persistants

    OpenAIRE

    De Sevin , Etienne

    2006-01-01

    Nowadays, virtual humans such as non-player characters in computer games need to have a strong autonomy in order to live their own life in persistent virtual worlds. When designing autonomous virtual humans, the action selection problem needs to be considered, as it is responsible for decision making at each moment in time. Indeed action selection architectures for autonomous virtual humans need to be reactive, proactive, motivational, and emotional to obtain a high degree of autonomy and ind...

  14. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications

    International Nuclear Information System (INIS)

    Hoisie, A.; Lubeck, O.; Wasserman, H.

    1998-01-01

    The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor

  15. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    Science.gov (United States)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  16. Implementation of a cell-wise block-Gauss-Seidel iterative method for SN transport on a hybrid parallel computer architecture

    International Nuclear Information System (INIS)

    Rosa, Massimiliano; Warsa, James S.; Perks, Michael

    2011-01-01

    We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S_n transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine™ (Cell/B.E.)"1. LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S_n angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S_n transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems. (author)

  17. Evolution of Parallel Spindles Like genes in plants and highlight of unique domain architecture#

    Directory of Open Access Journals (Sweden)

    Consiglio Federica M

    2011-03-01

    Full Text Available Abstract Background Polyploidy has long been recognized as playing an important role in plant evolution. In flowering plants, the major route of polyploidization is suggested to be sexual through gametes with somatic chromosome number (2n. Parallel Spindle1 gene in Arabidopsis thaliana (AtPS1 was recently demonstrated to control spindle orientation in the 2nd division of meiosis and, when mutated, to induce 2n pollen. Interestingly, AtPS1 encodes a protein with a FHA domain and PINc domain putatively involved in RNA decay (i.e. Nonsense Mediated mRNA Decay. In potato, 2n pollen depending on parallel spindles was described long time ago but the responsible gene has never been isolated. The knowledge derived from AtPS1 as well as the availability of genome sequences makes it possible to isolate potato PSLike (PSL and to highlight the evolution of PSL family in plants. Results Our work leading to the first characterization of PSLs in potato showed a greater PSL complexity in this species respect to Arabidopsis thaliana. Indeed, a genomic PSL locus and seven cDNAs affected by alternative splicing have been cloned. In addition, the occurrence of at least two other PSL loci in potato was suggested by the sequence comparison of alternatively spliced transcripts. Phylogenetic analysis on 20 Viridaeplantae showed the wide distribution of PSLs throughout the species and the occurrence of multiple copies only in potato and soybean. The analysis of PSLFHA and PSLPINc domains evidenced that, in terms of secondary structure, a major degree of variability occurred in PINc domain respect to FHA. In terms of specific active sites, both domains showed diversification among plant species that could be related to a functional diversification among PSL genes. In addition, some specific active sites were strongly conserved among plants as supported by sequence alignment and by evidence of negative selection evaluated as difference between non-synonymous and

  18. Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures

    Science.gov (United States)

    Romano, Paul Kollath

    Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with

  19. A proposed scalable parallel open architecture data acquisition system for low to high rate experiments, test beams and all SSC detectors

    International Nuclear Information System (INIS)

    Barsotti, E.; Booth, A.; Bowden, M.; Swoboda, C.; Lockyer, N.; Vanberg, R.

    1990-01-01

    A new era of high-energy physics research is beginning requiring accelerators with much higher luminosities and interaction rates in order to discover new elementary particles. As a consequence, both orders of magnitude higher data rates from the detector and online processing power, well beyond the capabilities of current high energy physics data acquisition systems, are required. This paper describes a proposed new data acquisition system architecture which draws heavily from the communications industry, is totally parallel (i.e., without any bottlenecks), is capable of data rates of hundreds of Gigabytes per second from the detector and into an array of online processors (i.e., processor farm), and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms. The main features of the proposed Scalable Parallel Open Architecture data acquisition system are standard interface ICs to detector subsystems wherever possible, fiber optic digital data transmission from the near-detector electronics, a self-routing parallel event builder, and the use of industry-supported and high-level language programmable processors in the proposed BCD system for both triggers and online filters. A brief status report of an ongoing project at Fermilab to build a prototype of the proposed data acquisition system architecture is given in the paper. The major component of the system, a self-routing parallel event builder, is described in detail

  20. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    Science.gov (United States)

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  1. Parallel VLSI Architecture

    Science.gov (United States)

    Truong, T. K.; Reed, I.; Yeh, C.; Shao, H.

    1985-01-01

    Fermat number transformation convolutes two digital data sequences. Very-large-scale integration (VLSI) applications, such as image and radar signal processing, X-ray reconstruction, and spectrum shaping, linear convolution of two digital data sequences of arbitrary lenghts accomplished using Fermat number transform (ENT).

  2. A scalable parallel open architecture data acquisition system for low to high rate experiments, test beams and all SSC [Superconducting Super Collider] detectors

    International Nuclear Information System (INIS)

    Barsotti, E.; Booth, A.; Bowden, M.; Swoboda, C.; Lockyer, N.; VanBerg, R.

    1989-12-01

    A new era of high-energy physics research is beginning requiring accelerators with much higher luminosities and interaction rates in order to discover new elementary particles. As a consequences, both orders of magnitude higher data rates from the detector and online processing power, well beyond the capabilities of current high energy physics data acquisition systems, are required. This paper describes a new data acquisition system architecture which draws heavily from the communications industry, is totally parallel (i.e., without any bottlenecks), is capable of data rates of hundreds of GigaBytes per second from the detector and into an array of online processors (i.e., processor farm), and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms. The main features of the system architecture are standard interface ICs to detector subsystems wherever possible, fiber optic digital data transmission from the near-detector electronics, a self-routing parallel event builder, and the use of industry-supported and high-level language programmable processors in the proposed BCD system for both triggers and online filters. A brief status report of an ongoing project at Fermilab to build the self-routing parallel event builder will also be given in the paper. 3 figs., 1 tab

  3. Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results

    Science.gov (United States)

    Fijany, Amir; Toomarian, Benny N.

    2000-01-01

    -based architectures for highly parallel and systolic computation of signal/image processing applications, such as FFT and Wavelet and Wlash-Hadamard Transforms.

  4. Un outil pour la mise en valeur de l’architecture du XXe siècle : l’édition de proximité et ses produits

    Directory of Open Access Journals (Sweden)

    Gérard Monnier

    2011-06-01

    Full Text Available Le guide d’architecture est devenu un ouvrage de référence, au contenu scientifique incontestable. Mais il atteint ses limites : devenu un ouvrage de 300 à 400 pages, lourd, quelquefois démultiplié en plusieurs volumes, il n’a plus de guide que le nom. D’autres formes de publications sont possibles, que nous réunissons sous le terme générique de produits de l’édition de proximité (PEP ; sous la forme de la fiche, de la notice, du dépliant, ou de la mini brochure, les PEP se sont imposés dans les années 1990 et depuis. Contenu monographique, association de l’image et du texte, forme concise du texte, unité du lieu, informations pratiques, distribution gratuite : le PEP est un outil léger, mais capable de hautes performances. Il est élaboré dans une perspective de communication (qui le rapproche de la presse imprimée, d’intervention (qui en fait un outil pour une manifestation locale et de service (qui en fait un mode d’emploi : horaires d’accès, et itinéraire de visite ; il n’appartient pas en général au circuit de l’industrie du livre et échappe (à tort le plus souvent au principe du dépôt légal. Sa distribution s’écarte des structures de la diffusion ; le PEP fait l’objet d’une diffusion locale exclusive : une diffusion sur place, une mise à la disposition du public en accès libre et hors commerce.The architecture guide has become a reference work, the scientific content undeniable. But it reached its limits became a book of 300 to 400 pages, heavy, sometimes multiplied into several volumes; it has no guide but the name. Other forms of publications are possible, as we gather under the generic products of the local edition (PEP in the form of sheet, leaflet, or mini-brochure, PEP were imposed in the 1990s and since. Content monographic combination of image and text, concise text, unity of place, practical information, free distribution: PEP is a lightweight, but capable of high performance

  5. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Kirk, B.L. [Oak Ridge National Lab., TN (United States); Sartori, E. [OCDE/OECD NEA Data Bank, Issy-les-Moulineaux (France); Viedma, L.G. de [Consejo de Seguridad Nuclear, Madrid (Spain)

    1997-06-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.

  6. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    International Nuclear Information System (INIS)

    Kirk, B.L.; Sartori, E.; Viedma, L.G. de

    1997-01-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee's Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community's computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management

  7. Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

    Energy Technology Data Exchange (ETDEWEB)

    Colombet, L

    1994-10-07

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.

  8. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  9. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques Held 24-26 August 1994 in Montreal, Canada

    Science.gov (United States)

    1994-08-26

    International Symposium on Computer Architecture, April 1994. [16] D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and Richard Brown, "Design...F. Catthoor, G. Goossens , et al.: Open-ended System for High-Level Synthesis of Flexible Signal Processors, Proc. European Conference on Design

  10. Hybrid MPI/OpenMP parallelization of the explicit Volterra integral equation solver for multi-core computer architectures

    KAUST Repository

    Al Jarro, Ahmed

    2011-08-01

    A hybrid MPI/OpenMP scheme for efficiently parallelizing the explicit marching-on-in-time (MOT)-based solution of the time-domain volume (Volterra) integral equation (TD-VIE) is presented. The proposed scheme equally distributes tested field values and operations pertinent to the computation of tested fields among the nodes using the MPI standard; while the source field values are stored in all nodes. Within each node, OpenMP standard is used to further accelerate the computation of the tested fields. Numerical results demonstrate that the proposed parallelization scheme scales well for problems involving three million or more spatial discretization elements. © 2011 IEEE.

  11. Energy efficiency for the multiport power converters architectures of series and parallel hybrid power source type used in plug-in/V2G fuel cell vehicles

    International Nuclear Information System (INIS)

    Bizon, Nicu

    2013-01-01

    Highlights: ► It is analyzed the series and parallel Hybrid Power Source (HPS) topology for plug-in Fuel Cell Vehicle (PFCV). ► An energy efficiency analysis of the Multiport Power Converter (MPC) of both HPSs is performed. ► The MPC energy efficiency features were shown by analytical computing in all PFCV regimes. -- Abstract: In this paper it is presented a mathematical analysis of the energy efficiency for the Multiport Power Converter (MPC) used in series and parallel Hybrid Power Source (HPS) architectures type on the plug-in Fuel Cell Vehicles (PFCVs). The aim of the analysis is to provide general conclusions for a wide range of PFCV operating regimes that are chosen for efficient use of the MPC architecture on each particular drive cycle. In relation with FC system of PFCV, the Energy Storage System (ESS) can operate in following regimes: (1) Charge-Sustaining (CS), (2) Charge-Depleting (CD), and (3) Charge-Increasing (CI). Considering the imposed window for the ESS State-Of-Charge (SOC), the MPC can be connected to renewable plug-in Charging Stations (PCSs) to exchange power with Electric Power (EP) system, when it is necessary for both. The Energy Management Unit (EMU) that communicates with the EP system will establish the moments to match the PFCV power demand with supply availability of the EP grid, stabilizing it. The MPC energy efficiency of the PFCVs is studied when the ESS is charged (discharged) from (to) the home/PCS/EP system. The comparative results were shown for both PFCV architectures through the analytical calculation performed and the appropriate Matlab/Simulink® simulations presented.

  12. Hybrid MPI/OpenMP parallelization of the explicit Volterra integral equation solver for multi-core computer architectures

    KAUST Repository

    Al Jarro, Ahmed; Bagci, Hakan

    2011-01-01

    A hybrid MPI/OpenMP scheme for efficiently parallelizing the explicit marching-on-in-time (MOT)-based solution of the time-domain volume (Volterra) integral equation (TD-VIE) is presented. The proposed scheme equally distributes tested field values

  13. NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

    Science.gov (United States)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

    2014-07-01

    We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180

  14. Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures

    Energy Technology Data Exchange (ETDEWEB)

    Bassetti, F.; Davis, K.; Quinlan, D.

    1998-12-31

    High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes two optimizing transformations suitable for certain classes of numerical algorithms, one for reducing the cost of inter-processor communication, and one for improving cache utilization; demonstrates and analyzes the resulting performance gains; and indicates how these transformations are being automated.

  15. Temporal locality optimizations for stencil operations for parallel object-oriented scientific frameworks on cache-based architectures

    Energy Technology Data Exchange (ETDEWEB)

    Bassetti, F.; Davis, K.; Quinlan, D.

    1998-12-01

    High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes a technique for introducing cache blocking suitable for certain classes of numerical algorithms, demonstrates and analyzes the resulting performance gains, and indicates how this optimization transformation is being automated.

  16. Experimental study of heat transfer for parallel flow in tube bundles with constant heat flux and for medium Prandtl numbers; Etude experimentale du transfert de chaleur dans des faisceaux tubulaires en ecoulement parallele pour une densite de flux thermique constante dans le domaine des nombres de Prandtl moyens

    Energy Technology Data Exchange (ETDEWEB)

    Rieger, M [Commissariat a l' Energie Atomique, 91 - Saclay (France). Centre d' Etudes Nucleaires

    1968-06-01

    The heat transfer parameters were determined experimentally in electrically heated tube bundles for turbulent flow parallel to the axis. The tubes were arranged in a pattern of equilateral triangles. The ratios of the distance between the axes of the tubes to their external diameter were 1.60 and 1.25 in the two test sections studied. The experiments were carried out with distilled water and with a mixture of 60 per cent ethylene glycol and 40 per cent water. The values obtained for the Prandtl numbers in this way fell within the range from 2.3 to 18. The Reynolds numbers were varied between 10{sup 4} and 2.10{sup 5}. The relation between the mean heat transfer coefficients and the friction factor in the tube bundles was found from the experiments as: Nu = [Re Pr {zeta}/8]/[1+{radical}({zeta}/8) 8.8 (Pr-1.3) Pr{sup -0.22}]. The experimentally determined mean Nusselt numbers were also given by the following function: Nu = (0.0122 + 0.00245 p/d) Re{sup 0.86} Pr{sup 0.4}, with a maximum deviation of {+-}4 per cent. For certain local Nusselt numbers, deviations of up to 20 per cent with respect to the relations given were observed. (author) [French] Dans des faisceaux tubulaires a chauffage electrique parcourus par un ecoulement turbulent parallele a l'axe, on a determine experimentalement les parametres du transfert de chaleur. Les centres des sections droites des tubes etaient des sommets de triangles equilateraux. Les rapports de la distance a l'axe des tubes et leur diametre exterieur dans les deux veines de mesure etudiees etaient de 1.60 et 1.25. Des essais furent effectues avec de l'eau distillee ainsi qu'avec un melange de 60 pour cent de glycol ethylenique et 40 pour cent d'eau. Les valeurs des nombres de Prandtl obtenues ainsi etaient situees entre 2.3 a 18. On a fait varier les nombres de Reynolds entre 10{sup 4} et 2.10{sup 5}. La relation entre les nombres caracteristiques de transfert de chaleur moyens et la perte de charge dans les faisceaux tabulaires

  17. Performance and advantages of a soft-core based parallel architecture for energy peak detection in the calorimeter Level 0 trigger for the NA62 experiment at CERN

    International Nuclear Information System (INIS)

    Ammendola, R.; Barbanera, M.; Bizzarri, M.; Bonaiuto, V.; Ceccucci, A.; Simone, N. De; Fantechi, R.; Fucci, A.; Lupi, M.; Ryjov, V.; Checcucci, B.; Papi, A.; Piccini, M.; Federici, L.; Paoluzzi, G.; Salamon, A.; Salina, G.; Sargeni, F.; Venditti, S.

    2017-01-01

    The NA62 experiment at CERN SPS has started its data-taking. Its aim is to measure the branching ratio of the ultra-rare decay K +  → π + ν ν̅ . In this context, rejecting the background is a crucial topic. One of the main background to the measurement is represented by the K +  → π + π 0  decay. In the 1-8.5 mrad decay region this background is rejected by the calorimetric trigger processor (Cal-L0). In this work we present the performance of a soft-core based parallel architecture built on FPGAs for the energy peak reconstruction as an alternative to an implementation completely founded on VHDL language.

  18. Performance and advantages of a soft-core based parallel architecture for energy peak detection in the calorimeter Level 0 trigger for the NA62 experiment at CERN

    Science.gov (United States)

    Ammendola, R.; Barbanera, M.; Bizzarri, M.; Bonaiuto, V.; Ceccucci, A.; Checcucci, B.; De Simone, N.; Fantechi, R.; Federici, L.; Fucci, A.; Lupi, M.; Paoluzzi, G.; Papi, A.; Piccini, M.; Ryjov, V.; Salamon, A.; Salina, G.; Sargeni, F.; Venditti, S.

    2017-03-01

    The NA62 experiment at CERN SPS has started its data-taking. Its aim is to measure the branching ratio of the ultra-rare decay K+ → π+ν ν̅ . In this context, rejecting the background is a crucial topic. One of the main background to the measurement is represented by the K+ → π+π0 decay. In the 1-8.5 mrad decay region this background is rejected by the calorimetric trigger processor (Cal-L0). In this work we present the performance of a soft-core based parallel architecture built on FPGAs for the energy peak reconstruction as an alternative to an implementation completely founded on VHDL language.

  19. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    Science.gov (United States)

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  20. Reconfigurable multi-DSP parallel computing architecture based on DSM%基于DSM的可重构多DSP并行处理架构

    Institute of Scientific and Technical Information of China (English)

    程鑫; 吴华春

    2012-01-01

    提出一种基于DSM的可在线重构多DSP并行处理架构,采用基于自定义内部总线的信息传递服务,在分布式物理内存上实现了统一编址的共享内存模型,减小了DSP之间的数据传递开销;设计基于VME总线的在线重构来实现针对消息传递服务的重定义,增强了并行计算架构的通用性.实验表明,采用此DSM能减小了并行DSP对共享数据同步访问开销,满足多轴精密同步运动控制系统需求.%A design of reconfigurable multi-digital signal processor (DSP) parallel computing architecture based on distributed shared memory (DSM) was proposed. A message-passing communication based on the user-defined internal bus (IB) was designed to implement a shared memory model on physically distributed memory, which decreased the data transmission overhead. Online reconfiguration mechanism was designed to implement message-passing communication reconfiguration, which in-creasd the universality of parallel architecture. The experiment shows that adopting the DSM introduced can reduce simultaneous access overhead to shared data, which satisfies the requirements of ultra-precise multi-axis motion control system.

  1. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations

    International Nuclear Information System (INIS)

    Dubois, J.

    2011-01-01

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr

  2. Design and implementation of an integrated architecture for massive parallel data treatment of analogue signals supplied by silicon detectors of very high spatial resolution

    International Nuclear Information System (INIS)

    Michel, J.

    1993-02-01

    This doctorate thesis studies an integrated architecture designed to a parallel massive treatment of analogue signals supplied by silicon detectors of very high spatial resolution. The first chapter is an introduction presenting the general outline and the triggering conditions of the spectrometer. Chapter two describes the operational structure of a microvertex detector made of Si micro-plates associated to the measuring chains. Information preconditioning is related to the pre-amplification stage, to the pile-up effects and to the reduction in the time characteristic due to the high counting rates. The chapter three describes the architecture of the analogue delay buffer, makes an analysis of the intrinsic noise and presents the operational testings and input/output control operations. The fourth chapter is devoted to the description of the analogue pulse shape processor and gives also the testings and the corresponding measurements on the circuit. Finally, the chapter five deals with the simplest modeling of the entire conditioning chain. Also, the testings and measuring procedures are here discussed. In conclusion the author presents some prospects for improving the signal-to-noise ratio by summation of the de-convoluted micro-paths. 78 refs., 78 figs., 1 annexe

  3. ISP: an optimal out-of-core image-set processing streaming architecture for parallel heterogeneous systems.

    Science.gov (United States)

    Ha, Linh Khanh; Krüger, Jens; Dihl Comba, João Luiz; Silva, Cláudio T; Joshi, Sarang

    2012-06-01

    Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.

  4. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  5. Parallel, Multigrid Finite Element Simulator for Fractured/Faulted and Other Complex Reservoirs based on Common Component Architecture (CCA)

    Energy Technology Data Exchange (ETDEWEB)

    Milind Deo; Chung-Kan Huang; Huabing Wang

    2008-08-31

    volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase simulations in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional simulators were developed for fractured reservoirs using the generalized framework. The thermal-compositional simulator was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The simulators were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently parallel. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.

  6. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  7. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  8. Scalable Distributed Architectures for Information Retrieval

    National Research Council Canada - National Science Library

    Lu, Zhihong

    1999-01-01

    .... Our distributed architectures exploit parallelism in information retrieval on a cluster of parallel IR servers using symmetric multiprocessors, and use partial collection replication and selection...

  9. Parallel Monte Carlo reactor neutronics

    International Nuclear Information System (INIS)

    Blomquist, R.N.; Brown, F.B.

    1994-01-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved

  10. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  11. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  12. Sud du Sahara | Page 100 | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    information libres pour les services de santé en Afrique. Langue French. Read more about Open Architecture, Standards and Information Systems (OASIS) for Healthcare in Africa. Langue English. Read more about Communication dans le but ...

  13. Architecture for the senses

    DEFF Research Database (Denmark)

    Ryhl, Camilla

    2009-01-01

    Accommodating sensory disabilities in architectural design requires specific design considerations. These are different from the ones included by the existing design concept 'accessibility', which primarily accommodates physical disabilites. Hence a new design concept 'sensory accessbility......' is presented as a parallel and complementary concept to the existing one. Sensory accessiblity accommodates sensory disabilities and describes architectural design requirements needed to ensure access to to the sensory experiences and architectural quality of a given space. The article is based on research...

  14. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  15. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  16. Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

    2017-01-01

    The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed

  17. Enterprise Architecture Integration in E-government

    NARCIS (Netherlands)

    Janssen, M.F.W.H.A.; Cresswell, A.

    2005-01-01

    Achieving goals of better integrated and responsive government services requires moving away from stand alone applications toward more comprehensive, integrated architectures. As a result there is mounting pressure to move from disparate systems operating in parallel toward a shared architecture

  18. Architectural prototyping

    DEFF Research Database (Denmark)

    Bardram, Jakob Eyvind; Christensen, Henrik Bærbak; Hansen, Klaus Marius

    2004-01-01

    A major part of software architecture design is learning how specific architectural designs balance the concerns of stakeholders. We explore the notion of "architectural prototypes", correspondingly architectural prototyping, as a means of using executable prototypes to investigate stakeholders...

  19. Patterns for Parallel Software Design

    CERN Document Server

    Ortega-Arjona, Jorge Luis

    2010-01-01

    Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin

  20. Architecture on Architecture

    DEFF Research Database (Denmark)

    Olesen, Karen

    2016-01-01

    that is not scientific or academic but is more like a latent body of data that we find embedded in existing works of architecture. This information, it is argued, is not limited by the historical context of the work. It can be thought of as a virtual capacity – a reservoir of spatial configurations that can...... correlation between the study of existing architectures and the training of competences to design for present-day realities.......This paper will discuss the challenges faced by architectural education today. It takes as its starting point the double commitment of any school of architecture: on the one hand the task of preserving the particular knowledge that belongs to the discipline of architecture, and on the other hand...

  1. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, Michael [Iowa State Univ., Ames, IA (United States)

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  2. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  3. Etude des architectures optiques pour le réseau local domestique, basées sur la fibre multimode (polymère et silice) et le multiplexage en longueur d’onde

    OpenAIRE

    Richard , Francis

    2012-01-01

    Only optical fiber is able to meet capacity and heterogeneity requirements of the future home network. Plastic or silica multimode fiber would be preferred for lower system costs. Two main multiformat architectures have been identified, based on an active or a passive star. Silica multimode fiber is suitable for an active star but plastic fiber could also be eligible, with its very low cost. Unfortunately, its bandwidth is far too limited. Solutions to increase its capacity, such as wavelengt...

  4. Parallel integer sorting with medium and fine-scale parallelism

    Science.gov (United States)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  5. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  6. Optical Neural Network Classifier Architectures

    National Research Council Canada - National Science Library

    Getbehead, Mark

    1998-01-01

    We present an adaptive opto-electronic neural network hardware architecture capable of exploiting parallel optics to realize real-time processing and classification of high-dimensional data for Air...

  7. Parallel processing for fluid dynamics applications

    International Nuclear Information System (INIS)

    Johnson, G.M.

    1989-01-01

    The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices

  8. High temporal resolution magnetic resonance imaging: development of a parallel three dimensional acquisition method for functional neuroimaging; Imagerie par resonance magnetique a haute resolution temporelle: developpement d'une methode d'acquisition parallele tridimensionnelle pour l'imagerie fonctionnelle cerebrale

    Energy Technology Data Exchange (ETDEWEB)

    Rabrait, C

    2007-11-15

    Echo Planar Imaging is widely used to perform data acquisition in functional neuroimaging. This sequence allows the acquisition of a set of about 30 slices, covering the whole brain, at a spatial resolution ranging from 2 to 4 mm, and a temporal resolution ranging from 1 to 2 s. It is thus well adapted to the mapping of activated brain areas but does not allow precise study of the brain dynamics. Moreover, temporal interpolation is needed in order to correct for inter-slices delays and 2-dimensional acquisition is subject to vascular in flow artifacts. To improve the estimation of the hemodynamic response functions associated with activation, this thesis aimed at developing a 3-dimensional high temporal resolution acquisition method. To do so, Echo Volume Imaging was combined with reduced field-of-view acquisition and parallel imaging. Indeed, E.V.I. allows the acquisition of a whole volume in Fourier space following a single excitation, but it requires very long echo trains. Parallel imaging and field-of-view reduction are used to reduce the echo train durations by a factor of 4, which allows the acquisition of a 3-dimensional brain volume with limited susceptibility-induced distortions and signal losses, in 200 ms. All imaging parameters have been optimized in order to reduce echo train durations and to maximize S.N.R., so that cerebral activation can be detected with a high level of confidence. Robust detection of brain activation was demonstrated with both visual and auditory paradigms. High temporal resolution hemodynamic response functions could be estimated through selective averaging of the response to the different trials of the stimulation. To further improve S.N.R., the matrix inversions required in parallel reconstruction were regularized, and the impact of the level of regularization on activation detection was investigated. Eventually, potential applications of parallel E.V.I. such as the study of non-stationary effects in the B.O.L.D. response

  9. Offre pour nos membres

    CERN Multimedia

    Staff Association

    2016-01-01

    Walibi Rhône-Alpes accueille son événement Halloween FreakShow le week-end du 15 et 16 octobre puis tous les jours du 20 octobre au 02 novembre 2016 ! ouverture prolongée jusqu’à 19h et feu d’artifices chaque soir 29, 30 et 31 octobre ! Loup-garou show; 1 labyrinthe; jeu de piste sur le parc (et nombreux lots à gagner); animations (sculpture sur citrouilles et maquillage) et d'autres surpises ! Tarifs pour nos membres : Entrée "Zone terrestre": 23 € au lieu de 29 €. Entrée gratuite pour les enfants de moins de 3 ans, avec accès aux attractions limité. Parking gratuit.

  10. Reconfigurable Parallel Computer Architectures for Space Applications

    Science.gov (United States)

    2012-08-07

    63 B-1. Dependency diagram of the hardware blocks implemented with VHDL .................. 64 C-1. The...distribution is unlimited. The CU has been fully implemented in a FPGA using VHDL . The CU hardware design is depicted in Figure 12. It consists of a main...the hardware design implemented in the FPGA using VHDL . The block diagram shows the dependency of all the VHDL blocks included in the design. Each

  11. Architectural heritage or theme park

    Directory of Open Access Journals (Sweden)

    Ignasi Solà-Morales

    1998-04-01

    Full Text Available The growing parallelism between the perception and the consumer use of theme parks and architectural heritage gives rise to a reflection about the fact that the architectural object has been turned into a museum piece, stripped  of its original value and its initial cultural substance to become images exposed to multiple gazes, thus producing what the author calis the "Theme Park effect", with consequences on protected architecture.

  12. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  13. Massively parallel quantum computer simulator

    NARCIS (Netherlands)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray

  14. Parallel Polarization State Generation.

    Science.gov (United States)

    She, Alan; Capasso, Federico

    2016-05-17

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  15. Architectural slicing

    DEFF Research Database (Denmark)

    Christensen, Henrik Bærbak; Hansen, Klaus Marius

    2013-01-01

    Architectural prototyping is a widely used practice, con- cerned with taking architectural decisions through experiments with light- weight implementations. However, many architectural decisions are only taken when systems are already (partially) implemented. This is prob- lematic in the context...... of architectural prototyping since experiments with full systems are complex and expensive and thus architectural learn- ing is hindered. In this paper, we propose a novel technique for harvest- ing architectural prototypes from existing systems, \\architectural slic- ing", based on dynamic program slicing. Given...... a system and a slicing criterion, architectural slicing produces an architectural prototype that contain the elements in the architecture that are dependent on the ele- ments in the slicing criterion. Furthermore, we present an initial design and implementation of an architectural slicer for Java....

  16. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  17. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  18. Informatique: tous pour un ... projet

    CERN Multimedia

    Delétraz, F; Requin, J-M

    2004-01-01

    "Pour des raisons de coût et d'efficacité, les chercheurs font de plus en plus travailler ensemble des ordinateurs éparpillés sur tous les continents. Pour faire avancer la science, tous les moyens et tous les réseaux sont bons" (1 page)

  19. Ultrascalable petaflop parallel supercomputer

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  20. METRIC context unit architecture

    Energy Technology Data Exchange (ETDEWEB)

    Simpson, R.O.

    1988-01-01

    METRIC is an architecture for a simple but powerful Reduced Instruction Set Computer (RISC). Its speed comes from the simultaneous processing of several instruction streams, with instructions from the various streams being dispatched into METRIC's execution pipeline as they become available for execution. The pipeline is thus kept full, with a mix of instructions for several contexts in execution at the same time. True parallel programming is supported within a single execution unit, the METRIC Context Unit. METRIC's architecture provides for expansion through the addition of multiple Context Units and of specialized Functional Units. The architecture thus spans a range of size and performance from a single-chip microcomputer up through large and powerful multiprocessors. This research concentrates on the specification of the METRIC Context Unit at the architectural level. Performance tradeoffs made during METRIC's design are discussed, and projections of METRIC's performance are made based on simulation studies.

  1. Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers

    DEFF Research Database (Denmark)

    The following topics are dealt with: parallel scientific computing; numerical algorithms; parallel nonnumerical algorithms; cloud computing; evolutionary computing; metaheuristics; applied mathematics; GPU computing; multicore systems; hybrid architectures; hierarchical parallelism; HPC systems......; power monitoring; energy monitoring; and distributed computing....

  2. Mechanical design of a free-wheel clutch for the thermal engine of a parallel hybrid vehicle with thermal and electrical power-train; Conception mecanique d'un accouplement a roue libre pour le moteur thermique d'un vehicule hybride parallele thermique et electrique

    Energy Technology Data Exchange (ETDEWEB)

    Santin, J.J.

    2001-07-01

    This thesis deals with the design of a free-wheel clutch. This unit is intended to replace the automated dry single-plate clutch of a parallel hybrid car with thermal and electric power-train. Furthermore, the car is a single shaft zero emission vehicle fitted with a controlled gearbox. Chapter one focuses on the type of hybrid vehicle studied. It shows the need to isolate the engine from the rest of the drive train, depending on the driving conditions. Chapter two presents and compares the two alternatives: automated clutch and free-wheel. In order to develop the free-wheel option, the torsional vibrations in the automotive drive line had to be closely studied. It required the design of a specific modular tool, as presented in chapter three, with the help of MATLAB SIMULINK. Lastly, chapter four shows how this tool was used during the design stage and specifies the way to build it. The free-wheel is then to be fitted to a prototype hybrid vehicle, constructed by both the LAMIH and PSA. (author)

  3. Research and development of a gaseous detector PIM (parallel ionization multiplier) dedicated to particle tracking under high hadron rates; Recherche et developpement d'un detecteur gazeux PIM (Parallel Ionization Multiplier) pour la trajectographie de particules sous un haut flux de hadrons

    Energy Technology Data Exchange (ETDEWEB)

    Beucher, J

    2007-10-15

    PIM (Parallel Ionization Multiplier) is a multi-stage micro-pattern gaseous detector using micro-meshes technology. This new device, based on Micromegas (micro-mesh gaseous structure) detector principle of operation, offers good characteristics for minimum ionizing particles track detection. However, this kind of detectors placed in hadron environment suffers discharges which degrade sensibly the detection efficiency and account for hazard to the front-end electronics. In order to minimize these strong events, it is convenient to perform charges multiplication by several successive steps. Within the framework of a European hadron physics project we have investigated the multi-stage PIM detector for high hadrons flux application. For this part of research and development, a systematic study for many geometrical configurations of a two amplification stages separated with a transfer space operated with the gaseous mixture Ne + 10% CO{sub 2} has been performed. Beam tests realised with high energy hadrons at CERN facility have given that discharges probability could be strongly reduced with a suitable PIM device. A discharges rate lower to 10{sup 9} by incident hadron and a spatial resolution of 51 {mu}m have been measured at the beginning efficiency plateau (>96 %) operating point. (author)

  4. A procedure for the evaluation of 2D radiographic texture analysis to assess 3D bone micro-architecture; Evaluation de l'analyse de la texture de radiographies 2D pour evaluer les micro architecture 3D d'os

    Energy Technology Data Exchange (ETDEWEB)

    Apostol, L.; Peyrin, F.; Yot, S.; Basset, O.; Odet, Ch. [CREATIS - Centre National de la Recherche Scientifique (UMR CNRS 5515), 69 - Villeurbanne (France); Apostal, L.; Boller, E. [European Synchrotron Radiation Facility (ESRF), 38 - Grenoble (France); Tabary, J.; Dinten, J.M. [CEA Grenoble, Lab. d' Electronique et de Technologie de l' Informatique (LETI), 38 (France); Boudousq, V.; Kotzki, P.O. [Faculte de Medecine, Lab. de Biophysique Medicale, 30 - Nimes (France)

    2004-07-01

    Although the diagnosis of osteoporosis is mainly based on Dual X-ray Absorptiometry, it has been shown that trabecular bone micro-architecture is also an important factor in regards of fracture risk, which can be efficiently assessed in vitro using three-dimensional x-ray microtomography ({mu}CT). In vivo, techniques based on high-resolution x-ray radiography associated to texture analysis have been proposed to investigate bone micro-architecture, but their relevance for giving pertinent 3D information is unclear. The purpose of this work was to develop a method for evaluating the relationships between 3D micro-architecture and 2D texture parameters, and optimizing the conditions for radiographic imaging. Bone sample images taken from cortical to cortical were acquired using 3D-synchrotron x-ray {mu}CT at the ESRF. The 3D digital images were further used for two purposes: 1) quantification of three-dimensional bone micro-architecture, 2) simulation of realistic x-ray radiographs under different acquisition conditions. Texture analysis was then applied to these 2D radiographs using a large variety of methods (co-occurrence, spectrum, fractal...). First results of the statistical analysis between 2D and 3D parameters allowed identifying the most relevant 2D texture parameters. (authors)

  5. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  6. Massively Parallel QCD

    International Nuclear Information System (INIS)

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-01-01

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results

  7. Parallel processing from applications to systems

    CERN Document Server

    Moldovan, Dan I

    1993-01-01

    This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

  8. Parallel kinematics type, kinematics, and optimal design

    CERN Document Server

    Liu, Xin-Jun

    2014-01-01

    Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others.   This book is intended for researchers, scientists, engineers and postgraduates or above with interes...

  9. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  10. Robotic architectures

    CSIR Research Space (South Africa)

    Mtshali, M

    2010-01-01

    Full Text Available In the development of mobile robotic systems, a robotic architecture plays a crucial role in interconnecting all the sub-systems and controlling the system. The design of robotic architectures for mobile autonomous robots is a challenging...

  11. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  12. Architecture & Environment

    Science.gov (United States)

    Erickson, Mary; Delahunt, Michael

    2010-01-01

    Most art teachers would agree that architecture is an important form of visual art, but they do not always include it in their curriculums. In this article, the authors share core ideas from "Architecture and Environment," a teaching resource that they developed out of a long-term interest in teaching architecture and their fascination with the…

  13. : tous les projets | Page 491 | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Sujet: INFORMATION TECHNOLOGY, FINANCIAL MANAGEMENT, FINANCIAL INSTITUTIONS, CREDIT COOPERATIVES. Région: Kenya, Tanzania, Uganda, North of Sahara, South of Sahara. Financement total : CA$ 250,400.00. OASIS : une architecture, des normes et des systèmes d'information libres pour les services ...

  14. : tous les projets | Page 489 | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Sujet: INFORMATION TECHNOLOGY, FINANCIAL MANAGEMENT, FINANCIAL INSTITUTIONS, CREDIT COOPERATIVES. Région: Kenya, Tanzania, Uganda, North of Sahara, South of Sahara. Financement total : CA$ 250,400.00. OASIS : une architecture, des normes et des systèmes d'information libres pour les services ...

  15. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  16. Architectures of prototypes and architectural prototyping

    DEFF Research Database (Denmark)

    Hansen, Klaus Marius; Christensen, Michael; Sandvad, Elmer

    1998-01-01

    together as a team, but developed a prototype that more than fulfilled the expectations of the shipping company. The prototype should: - complete the first major phase within 10 weeks, - be highly vertical illustrating future work practice, - continuously live up to new requirements from prototyping......This paper reports from experience obtained through development of a prototype of a global customer service system in a project involving a large shipping company and a university research group. The research group had no previous knowledge of the complex business of shipping and had never worked...... sessions with users, - evolve over a long period of time to contain more functionality - allow for 6-7 developers working intensively in parallel. Explicit focus on the software architecture and letting the architecture evolve with the prototype played a major role in resolving these conflicting...

  17. Parallel Computing in SCALE

    International Nuclear Information System (INIS)

    DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  18. Compiler Technology for Parallel Scientific Computation

    Directory of Open Access Journals (Sweden)

    Can Özturan

    1994-01-01

    Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

  19. Wakefield calculations on parallel computers

    International Nuclear Information System (INIS)

    Schoessow, P.

    1990-01-01

    The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs

  20. Automatic Management of Parallel and Distributed System Resources

    Science.gov (United States)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  1. Heat transfer in smooth tubes, between parallel plates, along a semi-infinite plate, in annular spaces and along tube bundles for exponential distribution of the heat flux in forced, laminar or turbulent flow; Transfert de chaleur dans des tubes lisses, entre des plaques planes paralleles, le long d'une plaque plane, dans des espaces annulaires et le long de faisceaux tubulaires pour une repartition exponentielle du flux de chaleur en ecoulement force, laminaire ou turbulent

    Energy Technology Data Exchange (ETDEWEB)

    Graber, H [Commissariat a l' Energie Atomique, 91 - Saclay (France). Centre d' Etudes Nucleaires

    1969-04-01

    By introducing an additional parameter F{sub 0}, the processes known hitherto for calculating heat transfer are extended to the heat flux distributions following an exponential law q{sub w} = exp(mx) which give a heat transfer coefficient, independent of position for laminar and turbulent flow with a linear pressure drop. For laminar flow along a semi-infinite plate, the heat flux distribution in accordance with the law qw = x{sup m} leads to the Nusselt number, regardless of the position. Nu is then determined by the thickness of the thermal boundary layer. For the annular space, the equations for explicit calculation of the temperature field will be given, as well as the Nusselt number in laminar flow and constant heat flux. In turbulent flow, the laws of distribution of eddy diffusivity for momentum in a tube, established by H. Reichardt, adapted for the annular space and the tube bundle, give the velocity field and the coefficient of friction and thus permit solution of the heat transfer equations. The results of the numerical calculation are given in the tables and diagrams for an extended range of the various parameters and compared with the experimental results. A simple process to determine the lower limit of the thermal entry length will be described. (author) [French] Par l'introduction d'un parametre supplementaire F{sub 0}, les procedes connus jusqu'a present pour le calcul du transfert de chaleur sont etendus aux repartitions exponentielles q{sub w} = exp(mx) du flux de chaleur qui indiquent un coefficient de transfert de chaleur independant de l'endroit pour l'ecoulement laminaire ou turbulent avec chute de pression lineaire. Pour l'ecoulement laminaire le long d'une plaque plane, la repartition du flux de chaleur selon la loi q{sub w} = x{sup m} conduit au nombre de Nusselt independant de l'endroit. Nu est alors determine par l'epaisseur de la couche limite thermique. Pour l'espace annulaire, seront indiquees les equations pour le calcul explicite du

  2. Heat transfer in smooth tubes, between parallel plates, along a semi-infinite plate, in annular spaces and along tube bundles for exponential distribution of the heat flux in forced, laminar or turbulent flow; Transfert de chaleur dans des tubes lisses, entre des plaques planes paralleles, le long d'une plaque plane, dans des espaces annulaires et le long de faisceaux tubulaires pour une repartition exponentielle du flux de chaleur en ecoulement force, laminaire ou turbulent

    Energy Technology Data Exchange (ETDEWEB)

    Graber, H. [Commissariat a l' Energie Atomique, 91 - Saclay (France). Centre d' Etudes Nucleaires

    1969-04-01

    By introducing an additional parameter F{sub 0}, the processes known hitherto for calculating heat transfer are extended to the heat flux distributions following an exponential law q{sub w} = exp(mx) which give a heat transfer coefficient, independent of position for laminar and turbulent flow with a linear pressure drop. For laminar flow along a semi-infinite plate, the heat flux distribution in accordance with the law qw = x{sup m} leads to the Nusselt number, regardless of the position. Nu is then determined by the thickness of the thermal boundary layer. For the annular space, the equations for explicit calculation of the temperature field will be given, as well as the Nusselt number in laminar flow and constant heat flux. In turbulent flow, the laws of distribution of eddy diffusivity for momentum in a tube, established by H. Reichardt, adapted for the annular space and the tube bundle, give the velocity field and the coefficient of friction and thus permit solution of the heat transfer equations. The results of the numerical calculation are given in the tables and diagrams for an extended range of the various parameters and compared with the experimental results. A simple process to determine the lower limit of the thermal entry length will be described. (author) [French] Par l'introduction d'un parametre supplementaire F{sub 0}, les procedes connus jusqu'a present pour le calcul du transfert de chaleur sont etendus aux repartitions exponentielles q{sub w} = exp(mx) du flux de chaleur qui indiquent un coefficient de transfert de chaleur independant de l'endroit pour l'ecoulement laminaire ou turbulent avec chute de pression lineaire. Pour l'ecoulement laminaire le long d'une plaque plane, la repartition du flux de chaleur selon la loi q{sub w} = x{sup m} conduit au nombre de Nusselt independant de l'endroit. Nu est alors determine par l'epaisseur de la couche limite thermique. Pour l'espace annulaire, seront

  3. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  4. Dvorak. Concerto pour violoncelle / Francis Dresel

    Index Scriptorium Estoniae

    Dresel, Francis

    1992-01-01

    Uuest heliplaadist "Dvorak. Concerto pour violoncelle; Schumann: Concerto pour violoncelle. Orchestre Symphonique d'Estonie, Orchestre Symphonique de la Radio TV d'URSS, Neeme Järvi" Vogue "Archives Sovietiques" 651033 1978

  5. MulticoreBSP for C : A high-performance library for shared-memory parallel programming

    NARCIS (Netherlands)

    Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

    2014-01-01

    The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the

  6. Parallel consensual neural networks.

    Science.gov (United States)

    Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H

    1997-01-01

    A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.

  7. The language parallel Pascal and other aspects of the massively parallel processor

    Science.gov (United States)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  8. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  9. Dessiner ses plans avec QCad le DAO pour tous

    CERN Document Server

    Pascual, André

    2009-01-01

    Logiciel libre de dessin assisté par ordinateur (DAO), QCad permet d'établi dans tous les domaines (architecture dessin industriel, schématique...) de plans rigoureux et normalisés dans un format compris par l'ensemble des logiciels de graphisme. Bien plus accessible qu'AutoCAD en termes de simplicité d'utilisation (et de prix!), il fonctionne sous Windows et Mac OS X aussi bien que sous Linux et allie convivialité et productivité pour convenir au néophyte comme au dessinateur plus aguerri.

  10. Fast parallel event reconstruction

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  11. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  12. Architectural Contestation

    NARCIS (Netherlands)

    Merle, J.

    2012-01-01

    This dissertation addresses the reductive reading of Georges Bataille's work done within the field of architectural criticism and theory which tends to set aside the fundamental ‘broken’ totality of Bataille's oeuvre and also to narrowly interpret it as a mere critique of architectural form,

  13. Architecture Sustainability

    NARCIS (Netherlands)

    Avgeriou, Paris; Stal, Michael; Hilliard, Rich

    2013-01-01

    Software architecture is the foundation of software system development, encompassing a system's architects' and stakeholders' strategic decisions. A special issue of IEEE Software is intended to raise awareness of architecture sustainability issues and increase interest and work in the area. The

  14. Memory architecture

    NARCIS (Netherlands)

    2012-01-01

    A memory architecture is presented. The memory architecture comprises a first memory and a second memory. The first memory has at least a bank with a first width addressable by a single address. The second memory has a plurality of banks of a second width, said banks being addressable by components

  15. The BLAZE language - A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  16. The BLAZE language: A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  17. Architectural Narratives

    DEFF Research Database (Denmark)

    Kiib, Hans

    2010-01-01

    a functional framework for these concepts, but tries increasingly to endow the main idea of the cultural project with a spatially aesthetic expression - a shift towards “experience architecture.” A great number of these projects typically recycle and reinterpret narratives related to historical buildings......In this essay, I focus on the combination of programs and the architecture of cultural projects that have emerged within the last few years. These projects are characterized as “hybrid cultural projects,” because they intend to combine experience with entertainment, play, and learning. This essay...... and architectural heritage; another group tries to embed new performative technologies in expressive architectural representation. Finally, this essay provides a theoretical framework for the analysis of the political rationales of these projects and for the architectural representation bridges the gap between...

  18. Computer programming and architecture the VAX

    CERN Document Server

    Levy, Henry

    2014-01-01

    Takes a unique systems approach to programming and architecture of the VAXUsing the VAX as a detailed example, the first half of this book offers a complete course in assembly language programming. The second describes higher-level systems issues in computer architecture. Highlights include the VAX assembler and debugger, other modern architectures such as RISCs, multiprocessing and parallel computing, microprogramming, caches and translation buffers, and an appendix on the Berkeley UNIX assembler.

  19. Power-efficient computer architectures recent advances

    CERN Document Server

    Själander, Magnus; Kaxiras, Stefanos

    2014-01-01

    As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture.Table of Contents: Introduction / Voltage and Frequency Management / Heterogeneity and Sp

  20. Iterative algorithms for large sparse linear systems on parallel computers

    Science.gov (United States)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  1. When It Rains, It Pours

    Science.gov (United States)

    Mills, Linda

    2012-01-01

    "It's raining, it's pouring, the old man is snoring!" "The itsy, bitsy spider crawled up the waterspout, down came the rain and washed the spider out. Out came the sun and dried up all the rain, and the itsy, bitsy spider went up the spout again." What do children's nursery rhymes have to do with the school library? The author begins by telling a…

  2. Design for scalability in 3D computer graphics architectures

    DEFF Research Database (Denmark)

    Holten-Lund, Hans Erik

    2002-01-01

    This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...

  3. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  4. Architectural technology

    DEFF Research Database (Denmark)

    2005-01-01

    The booklet offers an overall introduction to the Institute of Architectural Technology and its projects and activities, and an invitation to the reader to contact the institute or the individual researcher for further information. The research, which takes place at the Institute of Architectural...... Technology at the Roayl Danish Academy of Fine Arts, School of Architecture, reflects a spread between strategic, goal-oriented pilot projects, commissioned by a ministry, a fund or a private company, and on the other hand projects which originate from strong personal interests and enthusiasm of individual...

  5. Systemic Architecture

    DEFF Research Database (Denmark)

    Poletto, Marco; Pasquero, Claudia

    -up or tactical design, behavioural space and the boundary of the natural and the artificial realms within the city and architecture. A new kind of "real-time world-city" is illustrated in the form of an operational design manual for the assemblage of proto-architectures, the incubation of proto-gardens...... and the coding of proto-interfaces. These prototypes of machinic architecture materialize as synthetic hybrids embedded with biological life (proto-gardens), computational power, behavioural responsiveness (cyber-gardens), spatial articulation (coMachines and fibrous structures), remote sensing (FUNclouds...

  6. Humanizing Architecture

    DEFF Research Database (Denmark)

    Toft, Tanya Søndergaard

    2015-01-01

    The article proposes the urban digital gallery as an opportunity to explore the relationship between ‘human’ and ‘technology,’ through the programming of media architecture. It takes a curatorial perspective when proposing an ontological shift from considering media facades as visual spectacles...... agency and a sense of being by way of dematerializing architecture. This is achieved by way of programming the symbolic to provide new emotional realizations and situations of enlightenment in the public audience. This reflects a greater potential to humanize the digital in media architecture....

  7. MT-ADRES: Multithreading on Coarse-Grained Reconfigurable Architecture

    DEFF Research Database (Denmark)

    Wu, Kehuai; Kanstein, Andreas; Madsen, Jan

    2007-01-01

    The coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) and its compiler offer high instruction-level parallelism (ILP) to applications by means of a sparsely interconnected array of functional units and register files. As high-ILP archi......The coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) and its compiler offer high instruction-level parallelism (ILP) to applications by means of a sparsely interconnected array of functional units and register files. As high......-ILP architectures achieve only low parallelism when executing partially sequential code segments, which is also known as Amdahl’s law, this paper proposes to extend ADRES to MT-ADRES (Multi-Threaded ADRES) to also exploit thread-level parallelism. On MT-ADRES architectures, the array can be partitioned in multiple...

  8. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  9. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  10. Development of a parallel DBMS on the basis of PostgreSQL

    OpenAIRE

    Pan, C.

    2011-01-01

    The paper describes the architecture and the design of PargreSQL parallel database management system (DBMS) for distributed memory multiprocessors. PargreSQL is based upon PostgreSQL open-source DBMS and exploits partitioned parallelism.

  11. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    Science.gov (United States)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  12. Architectural Theatricality

    DEFF Research Database (Denmark)

    Tvedebrink, Tenna Doktor Olsen

    environments and a knowledge gap therefore exists in present hospital designs. Consequently, the purpose of this thesis has been to investigate if any research-based knowledge exist supporting the hypothesis that the interior architectural qualities of eating environments influence patient food intake, health...... and well-being, as well as outline a set of basic design principles ‘predicting’ the future interior architectural qualities of patient eating environments. Methodologically the thesis is based on an explorative study employing an abductive approach and hermeneutic-interpretative strategy utilizing tactics...... and food intake, as well as a series of references exist linking the interior architectural qualities of healthcare environments with the health and wellbeing of patients. On the basis of these findings, the thesis presents the concept of Architectural Theatricality as well as a set of design principles...

  13. Des semences pour vivre | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    12 juil. 2011 ... Partout au Canada, le marché des aliments biologiques ne cesse de croître, ... du Centre de recherches pour le développement international (CRDI) et d'Inter ... des variétés de semences et des modes de culture traditionnels, face à ... La protection de l'accès à l'eau contre les effets de l'étalement urbain et ...

  14. Architectural design and analysis of a programmable image processor

    International Nuclear Information System (INIS)

    Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

    2003-01-01

    In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)

  15. Implementations of BLAST for parallel computers.

    Science.gov (United States)

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  16. Parallel imaging microfluidic cytometer.

    Science.gov (United States)

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.

  17. Lock-free parallel garbage collection

    NARCIS (Netherlands)

    H. Gao; J.F. Groote (Jan Friso); W.H. Hesselink (Wim)

    2005-01-01

    htmlabstract This paper presents a lock-free parallel algorithm for mark&sweep garbage collection (GC) in a realistic model using synchronization primitives compare-and-swap (CAS) and load-linked/store-conditional (LL/SC) offered by machine architectures. Mutators and collectors can simultaneously

  18. Heuristic framework for parallel sorting computations | Nwanze ...

    African Journals Online (AJOL)

    Parallel sorting techniques have become of practical interest with the advent of new multiprocessor architectures. The decreasing cost of these processors will probably in the future, make the solutions that are derived thereof to be more appealing. Efficient algorithms for sorting scheme that are encountered in a number of ...

  19. Experience with a clustered parallel reduction machine

    NARCIS (Netherlands)

    Beemster, M.; Hartel, Pieter H.; Hertzberger, L.O.; Hofman, R.F.H.; Langendoen, K.G.; Li, L.L.; Milikowski, R.; Vree, W.G.; Barendregt, H.P.; Mulder, J.C.

    A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a

  20. Architectural freedom and industrialized architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    to explain that architecture can be thought as a complex and diverse design through customization, telling exactly the revitalized storey about the change to a contemporary sustainable and better performing expression in direct relation to the given context. Through the last couple of years we have...... proportions, to organize the process on site choosing either one room wall components or several rooms wall components – either horizontally or vertically. Combined with the seamless joint the playing with these possibilities the new industrialized architecture can deliver variations in choice of solutions...... for retrofit design. If we add the question of the installations e.g. ventilation to this systematic thinking of building technique we get a diverse and functional architecture, thereby creating a new and clearer story telling about new and smart system based thinking behind architectural expression....

  1. Architectural freedom and industrialized architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    to explain that architecture can be thought as a complex and diverse design through customization, telling exactly the revitalized storey about the change to a contemporary sustainable and better performing expression in direct relation to the given context. Through the last couple of years we have...... expression in the specific housing area. It is the aim of this article to expand the different design strategies which architects can use – to give the individual project attitudes and designs with architectural quality. Through the customized component production it is possible to choose different...... for retrofit design. If we add the question of the installations e.g. ventilation to this systematic thinking of building technique we get a diverse and functional architecture, thereby creating a new and clearer story telling about new and smart system based thinking behind architectural expression....

  2. Architectural freedom and industrialised architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    Architectural freedom and industrialized architecture. Inge Vestergaard, Associate Professor, Cand. Arch. Aarhus School of Architecture, Denmark Noerreport 20, 8000 Aarhus C Telephone +45 89 36 0000 E-mai l inge.vestergaard@aarch.dk Based on the repetitive architecture from the "building boom" 1960...... customization, telling exactly the revitalized storey about the change to a contemporary sustainable and better performed expression in direct relation to the given context. Through the last couple of years we have in Denmark been focusing a more sustainable and low energy building technique, which also include...... to the building physic problems a new industrialized period has started based on light weight elements basically made of wooden structures, faced with different suitable materials meant for individual expression for the specific housing area. It is the purpose of this article to widen up the different design...

  3. PICNIC Architecture.

    Science.gov (United States)

    Saranummi, Niilo

    2005-01-01

    The PICNIC architecture aims at supporting inter-enterprise integration and the facilitation of collaboration between healthcare organisations. The concept of a Regional Health Economy (RHE) is introduced to illustrate the varying nature of inter-enterprise collaboration between healthcare organisations collaborating in providing health services to citizens and patients in a regional setting. The PICNIC architecture comprises a number of PICNIC IT Services, the interfaces between them and presents a way to assemble these into a functioning Regional Health Care Network meeting the needs and concerns of its stakeholders. The PICNIC architecture is presented through a number of views relevant to different stakeholder groups. The stakeholders of the first view are national and regional health authorities and policy makers. The view describes how the architecture enables the implementation of national and regional health policies, strategies and organisational structures. The stakeholders of the second view, the service viewpoint, are the care providers, health professionals, patients and citizens. The view describes how the architecture supports and enables regional care delivery and process management including continuity of care (shared care) and citizen-centred health services. The stakeholders of the third view, the engineering view, are those that design, build and implement the RHCN. The view comprises four sub views: software engineering, IT services engineering, security and data. The proposed architecture is founded into the main stream of how distributed computing environments are evolving. The architecture is realised using the web services approach. A number of well established technology platforms and generic standards exist that can be used to implement the software components. The software components that are specified in PICNIC are implemented in Open Source.

  4. Architectural freedom and industrialised architecture

    DEFF Research Database (Denmark)

    Vestergaard, Inge

    2012-01-01

    to the building physic problems a new industrialized period has started based on light weight elements basically made of wooden structures, faced with different suitable materials meant for individual expression for the specific housing area. It is the purpose of this article to widen up the different design...... to this systematic thinking of the building technique we get a diverse and functional architecture. Creating a new and clearer story telling about new and smart system based thinking behind the architectural expression....

  5. Monument et espace urbain. Pour une Sémiotique des parcours et des structures de la ville

    Directory of Open Access Journals (Sweden)

    Ruggero Ragonese

    2012-11-01

    Cependant, partant de cet exigu corpus bibliographique, on peut chercher des bases pour commencer un travail sur le texte architectural, à partir de l'étude de monuments, capable de décrire le processus de transformation urbaine et les formes de l'espace construit.

  6. Parallel evolutionary computation in bioinformatics applications.

    Science.gov (United States)

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  7. Multiprocessor architecture: Synthesis and evaluation

    Science.gov (United States)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  8. Architectural geometry

    KAUST Repository

    Pottmann, Helmut; Eigensatz, Michael; Vaxman, Amir; Wallner, Johannes

    2014-01-01

    Around 2005 it became apparent in the geometry processing community that freeform architecture contains many problems of a geometric nature to be solved, and many opportunities for optimization which however require geometric understanding. This area of research, which has been called architectural geometry, meanwhile contains a great wealth of individual contributions which are relevant in various fields. For mathematicians, the relation to discrete differential geometry is significant, in particular the integrable system viewpoint. Besides, new application contexts have become available for quite some old-established concepts. Regarding graphics and geometry processing, architectural geometry yields interesting new questions but also new objects, e.g. replacing meshes by other combinatorial arrangements. Numerical optimization plays a major role but in itself would be powerless without geometric understanding. Summing up, architectural geometry has become a rewarding field of study. We here survey the main directions which have been pursued, we show real projects where geometric considerations have played a role, and we outline open problems which we think are significant for the future development of both theory and practice of architectural geometry.

  9. Architectural geometry

    KAUST Repository

    Pottmann, Helmut

    2014-11-26

    Around 2005 it became apparent in the geometry processing community that freeform architecture contains many problems of a geometric nature to be solved, and many opportunities for optimization which however require geometric understanding. This area of research, which has been called architectural geometry, meanwhile contains a great wealth of individual contributions which are relevant in various fields. For mathematicians, the relation to discrete differential geometry is significant, in particular the integrable system viewpoint. Besides, new application contexts have become available for quite some old-established concepts. Regarding graphics and geometry processing, architectural geometry yields interesting new questions but also new objects, e.g. replacing meshes by other combinatorial arrangements. Numerical optimization plays a major role but in itself would be powerless without geometric understanding. Summing up, architectural geometry has become a rewarding field of study. We here survey the main directions which have been pursued, we show real projects where geometric considerations have played a role, and we outline open problems which we think are significant for the future development of both theory and practice of architectural geometry.

  10. Shared Variable Oriented Parallel Precompiler for SPMD Model

    Institute of Scientific and Technical Information of China (English)

    1995-01-01

    For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.

  11. Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

    Science.gov (United States)

    Bhandarkar, S M; Chirravuri, S; Arnold, J

    1996-01-01

    Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.

  12. Relational Architecture

    DEFF Research Database (Denmark)

    Reeh, Henrik

    2018-01-01

    in a scholarly institution (element #3), as well as the certified PhD scholar (element #4) and the architectural profession, notably its labour market (element #5). This first layer outlines the contemporary context which allows architectural research to take place in a dynamic relationship to doctoral education...... a human and institutional development going on since around 1990 when the present PhD institution was first implemented in Denmark. To be sure, the model is centred around the PhD dissertation (element #1). But it involves four more components: the PhD candidate (element #2), his or her supervisor...... and interrelated fields in which history, place, and sound come to emphasize architecture’s relational qualities rather than the apparent three-dimensional solidity of constructed space. A third layer of relational architecture is at stake in the professional experiences after the defence of the authors...

  13. Architectural Anthropology

    DEFF Research Database (Denmark)

    Stender, Marie

    Architecture and anthropology have always had a common focus on dwelling, housing, urban life and spatial organisation. Current developments in both disciplines make it even more relevant to explore their boundaries and overlaps. Architects are inspired by anthropological insights and methods......, while recent material and spatial turns in anthropology have also brought an increasing interest in design, architecture and the built environment. Understanding the relationship between the social and the physical is at the heart of both disciplines, and they can obviously benefit from further...... collaboration: How can qualitative anthropological approaches contribute to contemporary architecture? And just as importantly: What can anthropologists learn from architects’ understanding of spatial and material surroundings? Recent theoretical developments in anthropology stress the role of materials...

  14. Architectural Engineers

    DEFF Research Database (Denmark)

    Petersen, Rikke Premer

    engineering is addresses from two perspectives – as an educational response and an occupational constellation. Architecture and engineering are two of the traditional design professions and they frequently meet in the occupational setting, but at educational institutions they remain largely estranged....... The paper builds on a multi-sited study of an architectural engineering program at the Technical University of Denmark and an architectural engineering team within an international engineering consultancy based on Denmark. They are both responding to new tendencies within the building industry where...... the role of engineers and architects increasingly overlap during the design process, but their approaches reflect different perceptions of the consequences. The paper discusses some of the challenges that design education, not only within engineering, is facing today: young designers must be equipped...

  15. Reframing Architecture

    DEFF Research Database (Denmark)

    Riis, Søren

    2013-01-01

    I would like to thank Prof. Stephen Read (2011) and Prof. Andrew Benjamin (2011) for both giving inspiring and elaborate comments on my article “Dwelling in-between walls: the architectural surround”. As I will try to demonstrate below, their two different responses not only supplement my article...... focuses on how the absence of an initial distinction might threaten the endeavour of my paper. In my reply to Read and Benjamin, I will discuss their suggestions and arguments, while at the same time hopefully clarifying the postphenomenological approach to architecture....

  16. Experimental high energy physics and modern computer architectures

    International Nuclear Information System (INIS)

    Hoek, J.

    1988-06-01

    The paper examines how experimental High Energy Physics can use modern computer architectures efficiently. In this connection parallel and vector architectures are investigated, and the types available at the moment for general use are discussed. A separate section briefly describes some architectures that are either a combination of both, or exemplify other architectures. In an appendix some directions in which computing seems to be developing in the USA are mentioned. (author)

  17. PRISMA/DB: A Parallel Main-Memory Relational DBMS

    NARCIS (Netherlands)

    Apers, Peter M.G.; Flokstra, Jan; van den Berg, Carel A.; Grefen, P.W.P.J.; Wilschut, A.N.; Kersten, Martin L.; van den Berg, C.A.

    1992-01-01

    PRISMA/DB, a full-fledged parallel, main memory relational database management system (DBMS) is described. PRISMA/DB's high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. A flexible architecture for experimenting with

  18. From parallel to distributed computing for reactive scattering calculations

    International Nuclear Information System (INIS)

    Lagana, A.; Gervasi, O.; Baraglia, R.

    1994-01-01

    Some reactive scattering codes have been ported on different innovative computer architectures ranging from massively parallel machines to clustered workstations. The porting has required a drastic restructuring of the codes to single out computationally decoupled cpu intensive subsections. The suitability of different theoretical approaches for parallel and distributed computing restructuring is discussed and the efficiency of related algorithms evaluated

  19. Comparison of some parallelization strategies of thermalhydraulic codes on GPUs

    International Nuclear Information System (INIS)

    Jendoubi, T.; Bergeaud, V.; Geay, A.

    2013-01-01

    Modern supercomputers architecture is now often based on hybrid concepts combining parallelism to distributed memory, parallelism to shared memory and also to GPUs (Graphic Process Units). In this work, we propose a new approach to take advantage of these graphic cards in thermohydraulics algorithms. (authors)

  20. Computer architecture fundamentals and principles of computer design

    CERN Document Server

    Dumas II, Joseph D

    2005-01-01

    Introduction to Computer ArchitectureWhat is Computer Architecture?Architecture vs. ImplementationBrief History of Computer SystemsThe First GenerationThe Second GenerationThe Third GenerationThe Fourth GenerationModern Computers - The Fifth GenerationTypes of Computer SystemsSingle Processor SystemsParallel Processing SystemsSpecial ArchitecturesQuality of Computer SystemsGenerality and ApplicabilityEase of UseExpandabilityCompatibilityReliabilitySuccess and Failure of Computer Architectures and ImplementationsQuality and the Perception of QualityCost IssuesArchitectural Openness, Market Timi

  1. Textile Architecture

    DEFF Research Database (Denmark)

    Heimdal, Elisabeth Jacobsen

    2010-01-01

    Textiles can be used as building skins, adding new aesthetic and functional qualities to architecture. Just like we as humans can put on a coat, buildings can also get dressed. Depending on our mood, or on the weather, we can change coat, and so can the building. But the idea of using textiles...

  2. A high performance parallel approach to medical imaging

    International Nuclear Information System (INIS)

    Frieder, G.; Frieder, O.; Stytz, M.R.

    1988-01-01

    Research into medical imaging using general purpose parallel processing architectures is described and a review of the performance of previous medical imaging machines is provided. Results demonstrating that general purpose parallel architectures can achieve performance comparable to other, specialized, medical imaging machine architectures is presented. A new back-to-front hidden-surface removal algorithm is described. Results demonstrating the computational savings obtained by using the modified back-to-front hidden-surface removal algorithm are presented. Performance figures for forming a full-scale medical image on a mesh interconnected multiprocessor are presented

  3. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  4. Collision detection of convex polyhedra on the NVIDIA GPU architecture for the discrete element method

    CSIR Research Space (South Africa)

    Govender, Nicolin

    2015-09-01

    Full Text Available consideration due to the architectural differences between CPU and GPU platforms. This paper describes the DEM algorithms and heuristics that are optimized for the parallel NVIDIA Kepler GPU architecture in detail. This includes a GPU optimized collision...

  5. Parallel Ada benchmarks for the SVMS

    Science.gov (United States)

    Collard, Philippe E.

    1990-01-01

    The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.

  6. From green architecture to architectural green

    DEFF Research Database (Denmark)

    Earon, Ofri

    2011-01-01

    that describes the architectural exclusivity of this particular architecture genre. The adjective green expresses architectural qualities differentiating green architecture from none-green architecture. Currently, adding trees and vegetation to the building’s facade is the main architectural characteristics...... they have overshadowed the architectural potential of green architecture. The paper questions how a green space should perform, look like and function. Two examples are chosen to demonstrate thorough integrations between green and space. The examples are public buildings categorized as pavilions. One......The paper investigates the topic of green architecture from an architectural point of view and not an energy point of view. The purpose of the paper is to establish a debate about the architectural language and spatial characteristics of green architecture. In this light, green becomes an adjective...

  7. Array processor architecture

    Science.gov (United States)

    Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

    1983-01-01

    A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

  8. Chip architecture - A revolution brewing

    Science.gov (United States)

    Guterl, F.

    1983-07-01

    Techniques being explored by microchip designers and manufacturers to both speed up memory access and instruction execution while protecting memory are discussed. Attention is given to hardwiring control logic, pipelining for parallel processing, devising orthogonal instruction sets for interchangeable instruction fields, and the development of hardware for implementation of virtual memory and multiuser systems to provide memory management and protection. The inclusion of microcode in mainframes eliminated logic circuits that control timing and gating of the CPU. However, improvements in memory architecture have reduced access time to below that needed for instruction execution. Hardwiring the functions as a virtual memory enhances memory protection. Parallelism involves a redundant architecture, which allows identical operations to be performed simultaneously, and can be directed with microcode to avoid abortion of intermediate instructions once on set of instructions has been completed.

  9. Bayer image parallel decoding based on GPU

    Science.gov (United States)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  10. MUF architecture /art London

    DEFF Research Database (Denmark)

    Svenningsen Kajita, Heidi

    2009-01-01

    Om MUF architecture samt interview med Liza Fior og Katherine Clarke, partnere i muf architecture/art......Om MUF architecture samt interview med Liza Fior og Katherine Clarke, partnere i muf architecture/art...

  11. Data parallel sorting for particle simulation

    Science.gov (United States)

    Dagum, Leonardo

    1992-01-01

    Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

  12. Parallel computation of nondeterministic algorithms in VLSI

    Energy Technology Data Exchange (ETDEWEB)

    Hortensius, P D

    1987-01-01

    This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.

  13. Domain decomposition methods and parallel computing

    International Nuclear Information System (INIS)

    Meurant, G.

    1991-01-01

    In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset

  14. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  15. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  16. Architectural fragments

    DEFF Research Database (Denmark)

    Bang, Jacob Sebastian

    2018-01-01

    I have created a large collection of plaster models: a collection of Obstructions, errors and opportunities that may develop into architecture. The models are fragments of different complex shapes as well as more simple circular models with different profiling and diameters. In this contect I have....... I try to invent the ways of drawing the models - that decode and unfold them into architectural fragments- into future buildings or constructions in the landscape. [1] Luigi Moretti: Italian architect, 1907 - 1973 [2] Man Ray: American artist, 1890 - 1976. in 2015, I saw the wonderful exhibition...... "Man Ray - Human Equations" at the Glyptotek in Copenhagen, organized by the Philips Collection in Washington D.C. and the Israel Museum in Jerusalem (in 2013). See also: "Man Ray - Human Equations" catalogue published by Hatje Cantz Verlag, Germany, 2014....

  17. Kosmos = architecture

    Directory of Open Access Journals (Sweden)

    Tine Kurent

    1985-12-01

    Full Text Available The old Greek word "kosmos" means not only "cosmos", but also "the beautiful order", "the way of building", "building", "scenography", "mankind", and, in the time of the New Testament, also "pagans". The word "arhitekton", meaning first the "master of theatrical scenography", acquired the meaning of "builder", when the words "kosmos" and ~kosmetes" became pejorative. The fear that architecture was not considered one of the arts before Renaissance, since none of the Muses supervised the art of building, results from the misunderstanding of the word "kosmos". Urania was the Goddes of the activity implied in the verb "kosmein", meaning "to put in the beautiful order" - everything, from the universe to the man-made space, i. e. the architecture.

  18. Metabolistic Architecture

    DEFF Research Database (Denmark)

    2013-01-01

    Textile Spaces presents different approaches to using textile as a spatial definer and artistic medium. The publication collages images and text, art and architecture, science, philosophy and literature, process and product, past, present and future. It forms an insight into soft materials' funct......' functional and poetic potentials, linking the disciplines through fragments that aim to inspire a further look into the artists' and architects' practices, while simultaneously framing these textile visions in a wider context.......Textile Spaces presents different approaches to using textile as a spatial definer and artistic medium. The publication collages images and text, art and architecture, science, philosophy and literature, process and product, past, present and future. It forms an insight into soft materials...

  19. Monte Carlo simulations on SIMD computer architectures

    International Nuclear Information System (INIS)

    Burmester, C.P.; Gronsky, R.; Wille, L.T.

    1992-01-01

    In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures

  20. Performative Architecture and Urban Spaces

    DEFF Research Database (Denmark)

    Kiib, Hans

    2008-01-01

      3 Workshops one exibition   Three conceptual architectural workshops took take place in parallel from August 16th - 22nd 2008. Each workshop carried a specific methodology and the goal is to come up with conceptual proposals that could be further developed for selected sites in the city of Aalb...... This workshop focus on temporary architecture and urban catalysts. Informal spaces and the interface between the built and the void are foremost in the development of performative urban environments and cultural interaction. ......  3 Workshops one exibition   Three conceptual architectural workshops took take place in parallel from August 16th - 22nd 2008. Each workshop carried a specific methodology and the goal is to come up with conceptual proposals that could be further developed for selected sites in the city...... The workshop model includes an open workshop where a handful of international architects are invited to spend five days with local architects, engineers and scholars contributing to a work of architectural vision and quality. The workshop includes presentations and discussions and development of projects...

  1. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  2. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    Science.gov (United States)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  3. Optimizing Engineering Tools Using Modern Ground Architectures

    Science.gov (United States)

    2017-12-01

    ENGINEERING TOOLS USING MODERN GROUND ARCHITECTURES by Ryan P. McArdle December 2017 Thesis Advisor: Marc Peters Co-Advisor: I.M. Ross...Master’s thesis 4. TITLE AND SUBTITLE OPTIMIZING ENGINEERING TOOLS USING MODERN GROUND ARCHITECTURES 5. FUNDING NUMBERS 6. AUTHOR(S) Ryan P. McArdle 7... engineering tools. First, the effectiveness of MathWorks’ Parallel Computing Toolkit is assessed when performing somewhat basic computations in

  4. Parallel algorithms for numerical linear algebra

    CERN Document Server

    van der Vorst, H

    1990-01-01

    This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p

  5. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  6. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  7. Parallel Evolutionary Optimization for Neuromorphic Network Training

    Energy Technology Data Exchange (ETDEWEB)

    Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)

    2016-01-01

    One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.

  8. Connecting Architecture and Implementation

    Science.gov (United States)

    Buchgeher, Georg; Weinreich, Rainer

    Software architectures are still typically defined and described independently from implementation. To avoid architectural erosion and drift, architectural representation needs to be continuously updated and synchronized with system implementation. Existing approaches for architecture representation like informal architecture documentation, UML diagrams, and Architecture Description Languages (ADLs) provide only limited support for connecting architecture descriptions and implementations. Architecture management tools like Lattix, SonarJ, and Sotoarc and UML-tools tackle this problem by extracting architecture information directly from code. This approach works for low-level architectural abstractions like classes and interfaces in object-oriented systems but fails to support architectural abstractions not found in programming languages. In this paper we present an approach for linking and continuously synchronizing a formalized architecture representation to an implementation. The approach is a synthesis of functionality provided by code-centric architecture management and UML tools and higher-level architecture analysis approaches like ADLs.

  9. Architectures for single-chip image computing

    Science.gov (United States)

    Gove, Robert J.

    1992-04-01

    This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.

  10. Bernard Tschumi Draws Architecture!

    Directory of Open Access Journals (Sweden)

    Gevork Hartoonian

    2014-08-01

    Full Text Available Bernard Tschumi’s delineation prepared for the Museu de Arte Contemporânea provides the starting point for this essay, which discusses the historicity of drawing and highlights the horizontality and the verticality that structure architecture’s contrast with the pictorial realm. Juxtaposing a freehand sketch with the digital image of the same project, Tschumi moves to address the paradox concerning the position of the body and drawing. This drawing also speaks for the reversal in the position of the body brought about by digital reproductivity.The reversal alludes to Tschumi’s theorization of architecture in terms of space and event. These, I will argue, are anticipated in The Manhattan Transcripts (1981 where a set of freehand drawings is used to evoke a filmic mood wherein the image is projected parallel to the spectator’s seated position. The essay goes further, suggesting that the theatricality permeating the present architecture is part of the shift from horizontality to the painterly, and yet the phenomenon is not merely a technical issue. Rather, it alludes to architecture’s dialogical rapport with painting at work since the Renaissance.

  11. Materials science and architecture

    Science.gov (United States)

    Bechthold, Martin; Weaver, James C.

    2017-12-01

    Materiality — the use of various materials in architecture — has been fundamental to the design and construction of buildings, and materials science has traditionally responded to needs formulated by design, engineering and construction professionals. Material properties and processes are shaping buildings and influencing how they perform. The advent of technologies such as digital fabrication, robotics and 3D printing have not only accelerated the development of new construction solutions, but have also led to a renewed interest in materials as a catalyst for novel architectural design. In parallel, materials science has transformed from a field that explains materials to one that designs materials from the bottom up. The conflation of these two trends is giving rise to materials-based design research in which architects, engineers and materials scientists work as partners in the conception of new materials systems and their applications. This Review surveys this development for different material classes (wood, ceramics, metals, concrete, glass, synthetic composites and polymers), with an emphasis on recent trends and innovations.

  12. Lemon : An MPI parallel I/O library for data encapsulation using LIME

    NARCIS (Netherlands)

    Deuzeman, Albert; Reker, Siebren; Urbach, Carsten

    We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message

  13. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  14. Massive hybrid parallelism for fully implicit multiphysics

    International Nuclear Information System (INIS)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

    2013-01-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  15. Massive hybrid parallelism for fully implicit multiphysics

    Energy Technology Data Exchange (ETDEWEB)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W. [Idaho National Laboratory, 2525 N. Fremont Ave., Idaho Falls, ID 83415 (United States)

    2013-07-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  16. A multitransputer parallel processing system (MTPPS)

    International Nuclear Information System (INIS)

    Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

    1993-01-01

    This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes

  17. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    Energy Technology Data Exchange (ETDEWEB)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  18. Architectural Drawing

    DEFF Research Database (Denmark)

    Steinø, Nicolai

    2018-01-01

    In a time of computer aided design, computer graphics and parametric design tools, the art of architectural drawing is in a state of neglect. But design and drawing are inseparably linked in ways which often go unnoticed. Essentially, it is very difficult, if not impossible, to conceive of a design...... is that computers can represent graphic ideas both faster and better than most medium-skilled draftsmen, drawing in design is not only about representing final designs. In fact, several steps involving the capacity to draw lie before the representation of a final design. Not only is drawing skills an important...... prerequisite for learning about the nature of existing objects and spaces, and thus to build a vocabulary of design. It is also a prerequisite for both reflecting and communicating about design ideas. In this paper, a taxonomy of notation, reflection, communication and presentation drawing is presented...

  19. Architectural Theatricality

    DEFF Research Database (Denmark)

    Tvedebrink, Tenna Doktor Olsen; Fisker, Anna Marie; Kirkegaard, Poul Henning

    2013-01-01

    In the attempt to improve patient treatment and recovery, researchers focus on applying concepts of hospitality to hospitals. Often these concepts are dominated by hotel-metaphors focusing on host–guest relationships or concierge services. Motivated by a project trying to improve patient treatment...... is known for his writings on theatricality, understood as a holistic design approach emphasizing the contextual, cultural, ritual and social meanings rooted in architecture. Relative hereto, the International Food Design Society recently argued, in a similar holistic manner, that the methodology used...... to provide an aesthetic eating experience includes knowledge on both food and design. Based on a hermeneutic reading of Semper’s theory, our thesis is that this holistic design approach is important when debating concepts of hospitality in hospitals. We use this approach to argue for how ‘food design...

  20. Lab architecture

    Science.gov (United States)

    Crease, Robert P.

    2008-04-01

    There are few more dramatic illustrations of the vicissitudes of laboratory architecturethan the contrast between Building 20 at the Massachusetts Institute of Technology (MIT) and its replacement, the Ray and Maria Stata Center. Building 20 was built hurriedly in 1943 as temporary housing for MIT's famous Rad Lab, the site of wartime radar research, and it remained a productive laboratory space for over half a century. A decade ago it was demolished to make way for the Stata Center, an architecturally striking building designed by Frank Gehry to house MIT's computer science and artificial intelligence labs (above). But in 2004 - just two years after the Stata Center officially opened - the building was criticized for being unsuitable for research and became the subject of still ongoing lawsuits alleging design and construction failures.

  1. Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells

    Science.gov (United States)

    2007-06-01

    Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing

  2. Becoming and Disappearing: Between Art, Architecture and Research

    Science.gov (United States)

    Beinart, Katy

    2014-01-01

    This paper examines some parallels and differences in pursuing practice-based research in art or architecture. Using a series of different headlines and examples, I examine the potential of working "between" art and architecture, which I argue could generate new, hybridised methodologies of practice through interrogating the…

  3. Automated Design of Application-Specific Smart Camera Architectures

    NARCIS (Netherlands)

    Caarls, W.

    2008-01-01

    Parallel heterogeneous multiprocessor systems are often shunned in embedded system design, not only because of their design complexity but because of the programming burden. Programs for such systems are architecture-dependent: the application developer needs architecture-specific knowledge to

  4. Design strategies for irregularly adapting parallel applications

    International Nuclear Information System (INIS)

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

  5. Distributed and parallel approach for handle and perform huge datasets

    Science.gov (United States)

    Konopko, Joanna

    2015-12-01

    Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.

  6. Parallel optoelectronic trinary signed-digit division

    Science.gov (United States)

    Alam, Mohammad S.

    1999-03-01

    The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

  7. R-GPU : A reconfigurable GPU architecture

    NARCIS (Netherlands)

    van den Braak, G.J.; Corporaal, H.

    2016-01-01

    Over the last decade, Graphics Processing Unit (GPU) architectures have evolved from a fixed-function graphics pipeline to a programmable, energy-efficient compute accelerator for massively parallel applications. The compute power arises from the GPU's Single Instruction/Multiple Threads

  8. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  9. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  10. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  11. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  12. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  13. SUSTAINABLE ARCHITECTURE : WHAT ARCHITECTURE STUDENTS THINK

    OpenAIRE

    SATWIKO, PRASASTO

    2013-01-01

    Sustainable architecture has become a hot issue lately as the impacts of climate change become more intense. Architecture educations have responded by integrating knowledge of sustainable design in their curriculum. However, in the real life, new buildings keep coming with designs that completely ignore sustainable principles. This paper discusses the results of two national competitions on sustainable architecture targeted for architecture students (conducted in 2012 and 2013). The results a...

  14. Time analysis of interconnection network implemented on the honeycomb architecture

    Energy Technology Data Exchange (ETDEWEB)

    Milutinovic, D [Inst. Michael Pupin, Belgrade (Yugoslavia)

    1996-12-31

    Problems of time domains analysis of the mapping of interconnection networks for parallel processing on one form of uniform massively parallel architecture of the cellular type are considered. The results of time analysis are discussed. It is found that changing the technology results in changing the mapping rules. 17 refs.

  15. Area analysis of interconnection networks implemented on the honeycomb architecture

    Energy Technology Data Exchange (ETDEWEB)

    Milutinovic, D

    1996-12-31

    The are utilization of interconnection networks for parallel processing on one form of uniform parallel architecture of cellular type is analyzed. Formulae for the number of cells necessity to realize a networks and the efficiency factor of the system are derived. 15 refs.

  16. Vectorization and parallelization of a production reactor assembly code

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.; Michigan Univ., Ann Arbor, MI

    1991-01-01

    In order to use efficiently the new features of supercomputers, production codes, usually written 10 -20 years ago, must be tailored for modern computer architectures. We have chosen to optimize the CPM-2 code, a production reactor assembly code based on the collision probability transport method. Substantial speedup in the execution times was obtained with the parallel/vector version of the CPM-2 code. In addition, we have developed a new transfer probability method, which removes some of the modelling limitations of the collision probability method encoded in the CPM-2 code, and can fully utilize the parallel/vector architecture of a multiprocessor IBM 3090. (author)

  17. Vectorization and parallelization of a production reactor assembly code

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.

    1991-01-01

    In order to efficiently use new features of supercomputers, production codes, usually written 10 - 20 years ago, must be tailored for modern computer architectures. We have chosen to optimize the CPM-2 code, a production reactor assembly code based on the collision probability transport method. Substantial speedups in the execution times were obtained with the parallel/vector version of the CPM-2 code. In addition, we have developed a new transfer probability method, which removes some of the modelling limitations of the collision probability method encoded in the CPM-2 code, and can fully utilize parallel/vector architecture of a multiprocessor IBM 3090. (author)

  18. Modeling Architectural Patterns Using Architectural Primitives

    NARCIS (Netherlands)

    Zdun, Uwe; Avgeriou, Paris

    2005-01-01

    Architectural patterns are a key point in architectural documentation. Regrettably, there is poor support for modeling architectural patterns, because the pattern elements are not directly matched by elements in modeling languages, and, at the same time, patterns support an inherent variability that

  19. Software architecture 2

    CERN Document Server

    Oussalah, Mourad Chabanne

    2014-01-01

    Over the past 20 years, software architectures have significantly contributed to the development of complex and distributed systems. Nowadays, it is recognized that one of the critical problems in the design and development of any complex software system is its architecture, i.e. the organization of its architectural elements. Software Architecture presents the software architecture paradigms based on objects, components, services and models, as well as the various architectural techniques and methods, the analysis of architectural qualities, models of representation of architectural templa

  20. Lightweight enterprise architectures

    CERN Document Server

    Theuerkorn, Fenix

    2004-01-01

    STATE OF ARCHITECTUREArchitectural ChaosRelation of Technology and Architecture The Many Faces of Architecture The Scope of Enterprise Architecture The Need for Enterprise ArchitectureThe History of Architecture The Current Environment Standardization Barriers The Need for Lightweight Architecture in the EnterpriseThe Cost of TechnologyThe Benefits of Enterprise Architecture The Domains of Architecture The Gap between Business and ITWhere Does LEA Fit? LEA's FrameworkFrameworks, Methodologies, and Approaches The Framework of LEATypes of Methodologies Types of ApproachesActual System Environmen

  1. Software architecture 1

    CERN Document Server

    Oussalah , Mourad Chabane

    2014-01-01

    Over the past 20 years, software architectures have significantly contributed to the development of complex and distributed systems. Nowadays, it is recognized that one of the critical problems in the design and development of any complex software system is its architecture, i.e. the organization of its architectural elements. Software Architecture presents the software architecture paradigms based on objects, components, services and models, as well as the various architectural techniques and methods, the analysis of architectural qualities, models of representation of architectural template

  2. A parallel architecture system dedicated to fast numerical calculus

    International Nuclear Information System (INIS)

    Harmanci, A.E.

    1982-04-01

    The project described here is the first result of a careful reflection oriented to the implementation of a machine intended for fast scientific computation, having in mind applications in the field of nuclear reactor safety. The selected structure is a data processing system of the MIMD type (Multiple Instruction, Multiple Data Stream). It is built by generalizing a basic cell constituted by associating an host processor and one or several processors dedicated to numerical computation, both operating alternatively on two areas of a common memory block. The principle of simultaneous operation of a large number of identical resources is used at every level of the structure. The system described here is hence modular and reconfigurable. The number of cells, the size and number of memory blocks may be chosen according to the needs. The communication between processors is carried out through the switching of the allocation of memory blocks. Moreover the numerical processors make the best use of private interconnections for synchronisation and fast data interchange. The present study devoted to the definition of the main hardware structures, will be followed by a simulation phase while suitable software tools will be developed [fr

  3. A universal bilateral manual controller utilizing a unique parallel architecture

    International Nuclear Information System (INIS)

    Bevill, P.J.; Lovett, J.T.

    1990-01-01

    Since 1987, the Advanced Technology Division of the US Department of Energy has sponsored a team composed of four universities and Oak Ridge National Laboratory to pursue research leading to the development and deployment of an advanced robotic system. The tasks to be performed by this system will be those tasks that are hazardous to humans, that generate significant occupational radiation exposure, and those which can be performed more rapidly and more accurately by an automated system. An essential component of the program plan is the annual technology demonstration performed at the Center for Engineering Systems Advanced Research (CESAR) at Oak Ridge National Laboratory. The scenario selected for the 1989 technology demonstration involved the cleanup of a spill of a simulated hazardous material. The demonstration utilized the seven-degrees-of-freedom CESARm manipulator, which is mounted aboard the HERMIES III transporter. The transporter traveled to the site of the simulated spill through an obstacle-strewn environment both under direct human control and autonomously, navigating by reference to the computer-stored world model. After arriving at the site of the spill, the vision system scanned and located the spill, thus determining its position in local and global coordinate frames. This information was used to generate a manipulator sweep pattern that resulted in the removal of the spilled material by a vacuum device

  4. Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

    National Research Council Canada - National Science Library

    Grama, Ananth; Manguoglu, Murat; Koyuturk, Mehmet; Naumov, Maxim; Sameh, Ahmed

    2008-01-01

    .... The study was motivated primarily by the lack of robustness of Krylov subspace iterative schemes with generic, black-box, pre-conditioners such as approximate (or incomplete) LU-factorizations...

  5. Design of a Load-Balancing Architecture For Parallel Firewalls

    National Research Council Canada - National Science Library

    Joyner, William

    1999-01-01

    Because firewalls can become a potential choke point as network speeds and loads increase, the Navy needs a cost-effective means of increasing data rate through firewalls by placing several machines...

  6. Algorithm-Architecture Affinity - Parallelism Changes the Picture

    DEFF Research Database (Denmark)

    Abildgren, Rasmus; Šaramentovas, Aleksandras; Ruzgys, Paulius

    Reducing the time-to-market factor is a challenge for many embedded systems designers. In that respect, hardwaresoftware partitioning is a key issue which has been studied during the last two decades. In this paper we present an extension to recent works dealing with metrics for guiding the hardw...

  7. Parallel-Computing Architecture for JWST Wavefront-Sensing Algorithms

    Science.gov (United States)

    2011-09-01

    results due to the increasing cost and complexity of each test. 2. ALGORITHM OVERVIEW Phase retrieval is an image-based wavefront-sensing...broadband illumination problems we have found that hand-tuning the right matrix sizes can account for a speedup of 86x faster. This comes from hand-picking...Wavefront Sensing and Control”. Proceedings of SPIE (2007) vol. 6687 (08). [5] Greenhouse, M. A., Drury , M. P., Dunn, J. L., Glazer, S. D., Greville, E

  8. Gyrokinetic simulation of finite-β plasmas on parallel architectures

    International Nuclear Information System (INIS)

    Reynders, J.V.W.

    1993-01-01

    Much research exists on the linear and non-linear properties of plasma microinstabilities induced by density and temperature gradients. There has been an interest in the electromagnetic or finite-β effects on these microinstabilities. This thesis focuses on the finite-β modification of an ion temperature gradient (ITG) driven microinstability in a two-dimensional shearless and sheared-slab geometries. A gyrokinetic model is employed in the numerical and analytic studies of this instability. Chapter 1 introduces the electromagnetic gyrokinetic model employed in the numerical and analytic studies of the ITG instability. Some discussion of the Klimontovich particle representation of the gyrokinetic Vlasov equation and a multiple scale model of the background plasma gradient is presented. Chapter 2 details the computational issues facing an electromagnetic gyrokinetic particle simulation of the ITG mode. An electromagnetic extension of the partially linearized algorithm is presented with a comparison of quiet particle initialization routines. Chapter 3 presents and compares algorithms for the gyrokinetic particle simulation technique on SIMD and MIMD computing platforms. Chapter 4 discusses electromagnetic gyrokinetic fluctuation theory and provides a comparison of analytic and numerical results. Chapter 5 contains a linear and a non-linear three-wave coupling analysis of the finite-β modified ITG mode in a shearless slab geometry. Comparisons are made with linear and partially linearized gyrokinetic simulation results. Chapter 6 presents results from a finite-β modified ITG mode in a sheared slab geometry. The linear dispersion relation is derived and results from an integral eigenvalue code are presented. Comparisons are made with the gyrokinetic particle code in a variety of limits with both adiabatic and non-adiabatic electrons. Evidence of ITG driven microtearing is presented

  9. Introduction to massively-parallel computing in high-energy physics

    CERN Document Server

    AUTHOR|(CDS)2083520

    1993-01-01

    Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...

  10. Parallel pic plasma simulation through particle decomposition techniques

    International Nuclear Information System (INIS)

    Briguglio, S.; Vlad, G.; Di Martino, B.; Naples, Univ. 'Federico II'

    1998-02-01

    Particle-in-cell (PIC) codes are among the major candidates to yield a satisfactory description of the detail of kinetic effects, such as the resonant wave-particle interaction, relevant in determining the transport mechanism in magnetically confined plasmas. A significant improvement of the simulation performance of such codes con be expected from parallelization, e.g., by distributing the particle population among several parallel processors. Parallelization of a hybrid magnetohydrodynamic-gyrokinetic code has been accomplished within the High Performance Fortran (HPF) framework, and tested on the IBM SP2 parallel system, using a 'particle decomposition' technique. The adopted technique requires a moderate effort in porting the code in parallel form and results in intrinsic load balancing and modest inter processor communication. The performance tests obtained confirm the hypothesis of high effectiveness of the strategy, if targeted towards moderately parallel architectures. Optimal use of resources is also discussed with reference to a specific physics problem [it

  11. The EPOS ICT Architecture

    Science.gov (United States)

    Jeffery, Keith; Harrison, Matt; Bailo, Daniele

    2016-04-01

    parallel the ICT team is tracking developments in ICT for relevance to EPOS-IP. In particular, the potential utilisation of e-Is (e-Infrastructures) such as GEANT(network), AARC (security), EGI (GRID computing), EUDAT (data curation), PRACE (High Performance Computing), HELIX-Nebula / Open Science Cloud (Cloud computing) are being assessed. Similarly relationships to other e-RIs (e-Research Infrastructures) such as ENVRI+, EXCELERATE and other ESFRI (European Strategic Forum for Research Infrastructures) projects are developed to share experience and technology and to promote interoperability. EPOS ICT team members are also involved in VRE4EIC, a project developing a reference architecture and component software services for a Virtual Research Environment to be superimposed on EPOS-ICS. The challenge which is being tackled now is therefore to keep consistency and interoperability among the different modules, initiatives and actors which participate to the process of running the EPOS platform. It implies both a continuous update about IT aspects of mentioned initiatives and a refinement of the e-architecture designed so far. One major aspect of EPOS-IP is the ICT support for legalistic, financial and governance aspects of the EPOS ERIC to be initiated during EPOS-IP. This implies a sophisticated AAAI (Authentication, authorization, accounting infrastructure) with consistency throughout the software, communications and data stack.

  12. ARQUITECTURAS INVISIBLES / Invisible architecture

    Directory of Open Access Journals (Sweden)

    Francisco Javier Montero Fernández

    2012-11-01

    questions that go beyond quality. Political or commercial interests tend to confuse the opportunity for promotion with the merit of the project. It would be possible to number many competitions which have supposed a significant advance in architecture in the last century; a certain nostalgia for those times of debate which, together with other similar formulas such as entries in biennials, triennials or exhibitions, were capable of making empirical research from direct practice. On the contrary, today the policy framework of the architectural project is often defined in a manner foreign to real architecture, administratively dominated by a process that prioritizes legal and technical compliance over the quality of the project itself, confusing the means with the end. Criticism always aims at the author; it forgets the notable procedures which armed a jury gratuitously exonerated of responsibility or that legal framework that asphyxiates both architecture and the spirit of inquiry which should always prevail. This issue of the journal considers the assimilation of the competition into a research process parallel to an architecture united with quality and progress, an advanced architecture.

  13. MT-ADRES: multi-threading on coarse-grained reconfigurable architecture

    DEFF Research Database (Denmark)

    Wu, Kehuai; Kanstein, Andreas; Madsen, Jan

    2008-01-01

    The coarse-grained reconfigurable architecture ADRES (architecture for dynamically reconfigurable embedded systems) and its compiler offer high instruction-level parallelism (ILP) to applications by means of a sparsely interconnected array of functional units and register files. As high-ILP archi......The coarse-grained reconfigurable architecture ADRES (architecture for dynamically reconfigurable embedded systems) and its compiler offer high instruction-level parallelism (ILP) to applications by means of a sparsely interconnected array of functional units and register files. As high......-ILP architectures achieve only low parallelism when executing partially sequential code segments, which is also known as Amdahl's law, this article proposes to extend ADRES to MT-ADRES (multi-threaded ADRES) to also exploit thread-level parallelism. On MT-ADRES architectures, the array can be partitioned...

  14. Parallel discrete event simulation

    NARCIS (Netherlands)

    Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

    1991-01-01

    In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation

  15. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  16. Solving the Stokes problem on a massively parallel computer

    DEFF Research Database (Denmark)

    Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya

    2001-01-01

    boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....

  17. Microwave tomography global optimization, parallelization and performance evaluation

    CERN Document Server

    Noghanian, Sima; Desell, Travis; Ashtari, Ali

    2014-01-01

    This book provides a detailed overview on the use of global optimization and parallel computing in microwave tomography techniques. The book focuses on techniques that are based on global optimization and electromagnetic numerical methods. The authors provide parallelization techniques on homogeneous and heterogeneous computing architectures on high performance and general purpose futuristic computers. The book also discusses the multi-level optimization technique, hybrid genetic algorithm and its application in breast cancer imaging.

  18. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  19. Computer architecture evaluation for structural dynamics computations: Project summary

    Science.gov (United States)

    Standley, Hilda M.

    1989-01-01

    The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.

  20. Decoupled Vector-Fetch Architecture with a Scalarizing Compiler

    OpenAIRE

    Lee, Yunsup

    2016-01-01

    As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and ...

  1. Proposed hardware architectures of particle filter for object tracking

    Science.gov (United States)

    Abd El-Halym, Howida A.; Mahmoud, Imbaby Ismail; Habib, SED

    2012-12-01

    In this article, efficient hardware architectures for particle filter (PF) are presented. We propose three different architectures for Sequential Importance Resampling Filter (SIRF) implementation. The first architecture is a two-step sequential PF machine, where particle sampling, weight, and output calculations are carried out in parallel during the first step followed by sequential resampling in the second step. For the weight computation step, a piecewise linear function is used instead of the classical exponential function. This decreases the complexity of the architecture without degrading the results. The second architecture speeds up the resampling step via a parallel, rather than a serial, architecture. This second architecture targets a balance between hardware resources and the speed of operation. The third architecture implements the SIRF as a distributed PF composed of several processing elements and central unit. All the proposed architectures are captured using VHDL synthesized using Xilinx environment, and verified using the ModelSim simulator. Synthesis results confirmed the resource reduction and speed up advantages of our architectures.

  2. Secure Storage Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Aderholdt, Ferrol [Tennessee Technological University; Caldwell, Blake A [ORNL; Hicks, Susan Elaine [ORNL; Koch, Scott M [ORNL; Naughton, III, Thomas J [ORNL; Pogge, James R [Tennessee Technological University; Scott, Stephen L [Tennessee Technological University; Shipman, Galen M [ORNL; Sorrillo, Lawrence [ORNL

    2015-01-01

    The purpose of this report is to clarify the challenges associated with storage for secure enclaves. The major focus areas for the report are: - review of relevant parallel filesystem technologies to identify assets and gaps; - review of filesystem isolation/protection mechanisms, to include native filesystem capabilities and auxiliary/layered techniques; - definition of storage architectures that can be used for customizable compute enclaves (i.e., clarification of use-cases that must be supported for shared storage scenarios); - investigate vendor products related to secure storage. This study provides technical details on the storage and filesystem used for HPC with particular attention on elements that contribute to creating secure storage. We outline the pieces for a a shared storage architecture that balances protection and performance by leveraging the isolation capabilities available in filesystems and virtualization technologies to maintain the integrity of the data. Key Points: There are a few existing and in-progress protection features in Lustre related to secure storage, which are discussed in (Chapter 3.1). These include authentication capabilities like GSSAPI/Kerberos and the in-progress work for GSSAPI/Host-keys. The GPFS filesystem provides native support for encryption, which is not directly available in Lustre. Additionally, GPFS includes authentication/authorization mechanisms for inter-cluster sharing of filesystems (Chapter 3.2). The limitations of key importance for secure storage/filesystems are: (i) restricting sub-tree mounts for parallel filesystem (which is not directly supported in Lustre or GPFS), and (ii) segregation of hosts on the storage network and practical complications with dynamic additions to the storage network, e.g., LNET. A challenge for VM based use cases will be to provide efficient IO forwarding of the parallel filessytem from the host to the guest (VM). There are promising options like para-virtualized filesystems to

  3. Architectural design decisions

    NARCIS (Netherlands)

    Jansen, Antonius Gradus Johannes

    2008-01-01

    A software architecture can be considered as the collection of key decisions concerning the design of the software of a system. Knowledge about this design, i.e. architectural knowledge, is key for understanding a software architecture and thus the software itself. Architectural knowledge is mostly

  4. Information Integration Architecture Development

    OpenAIRE

    Faulkner, Stéphane; Kolp, Manuel; Nguyen, Duy Thai; Coyette, Adrien; Do, Thanh Tung; 16th International Conference on Software Engineering and Knowledge Engineering

    2004-01-01

    Multi-Agent Systems (MAS) architectures are gaining popularity for building open, distributed, and evolving software required by systems such as information integration applications. Unfortunately, despite considerable work in software architecture during the last decade, few research efforts have aimed at truly defining patterns and languages for designing such multiagent architectures. We propose a modern approach based on organizational structures and architectural description lan...

  5. Fragments of Architecture

    DEFF Research Database (Denmark)

    Bang, Jacob Sebastian

    2016-01-01

    Topic 3: “Case studies dealing with the artistic and architectural work of architects worldwide, and the ties between specific artistic and architectural projects, methodologies and products”......Topic 3: “Case studies dealing with the artistic and architectural work of architects worldwide, and the ties between specific artistic and architectural projects, methodologies and products”...

  6. Lunar architecture

    Science.gov (United States)

    Malek, Shahin

    The climatic conditions of Earth and human trends for discover the space, make these questions that how we can design a camp on the moon as a base for traveling in space or how we can live on that condition and what kind of camp we can have on the moon?!The first step in this way was creating the International Space Station on earth's orbit. (International Space Station, 2001) Settlement on moon was proposed since knowledge about it growth. Regarding to new technologies, architects parallel to engineers are trying to design and invent new ways for human settlement on moon because of its suitable conditions. Proposed habitats range from the actual spacecraft lander or their used fuel tanks, to inflatable modules of various shapes. Due to the researches until now, the first requirement for the living on other planets is water existence for human breath and fuel and after that we need to solve air pressure and gravity difference. (Colonization of the Moon, 2004) The Goal of this research is to answer to the question which is designing a camp on the Moon. But for this goal, there is need to think and study more about the subject and its factors. With qualitative and comparative methodology, the conditions of the Earth and the Moon will be comparing in different categories such as nature, human and design. I think that after water discovery, with using local materials and appropriate building design which can be on surface or underground, along with new sciences, we can plan for long period living on Moon. The important point is to consider Function, Form and Structure together in designing on the Moon. References: Colonization of the Moon. (2004). Retrieved December 14, 2009, from Wikipedia: http://en.wikipedia.org/wiki/Colonizationo ft heM oonStructure, InternationalSpaceStation.(2001).Retrie http : //en.wikipedia.org/wiki/InternationalS paceS tation

  7. Guides et formulaires | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Demande de subvention de recherche du CRDI · Budget de proposition · Lignes directrices du CRDI pour la préparation du rapport d'étape technique · Lignes directrices du CRDI pour la préparation du rapport technique final · Lignes directrices du CRDI pour les dépenses de projet admissibles · Lignes directrices pour la ...

  8. A memory-array architecture for computer vision

    Energy Technology Data Exchange (ETDEWEB)

    Balsara, P.T.

    1989-01-01

    With the fast advances in the area of computer vision and robotics there is a growing need for machines that can understand images at a very high speed. A conventional von Neumann computer is not suited for this purpose because it takes a tremendous amount of time to solve most typical image processing problems. Exploiting the inherent parallelism present in various vision tasks can significantly reduce the processing time. Fortunately, parallelism is increasingly affordable as hardware gets cheaper. Thus it is now imperative to study computer vision in a parallel processing framework. The author should first design a computational structure which is well suited for a wide range of vision tasks and then develop parallel algorithms which can run efficiently on this structure. Recent advances in VLSI technology have led to several proposals for parallel architectures for computer vision. In this thesis he demonstrates that a memory array architecture with efficient local and global communication capabilities can be used for high speed execution of a wide range of computer vision tasks. This architecture, called the Access Constrained Memory Array Architecture (ACMAA), is efficient for VLSI implementation because of its modular structure, simple interconnect and limited global control. Several parallel vision algorithms have been designed for this architecture. The choice of vision problems demonstrates the versatility of ACMAA for a wide range of vision tasks. These algorithms were simulated on a high level ACMAA simulator running on the Intel iPSC/2 hypercube, a parallel architecture. The results of this simulation are compared with those of sequential algorithms running on a single hypercube node. Details of the ACMAA processor architecture are also presented.

  9. Penjadwalan Produksi Garment Menggunakan Algoritma Heuristic Pour

    Directory of Open Access Journals (Sweden)

    Rizal Rachman

    2018-04-01

    Full Text Available Abstrak Penjadwalan merupakan suatu kegiatan pengalokasian sumber daya yang terbatas untuk mengerjakan sejumlah pekerjaan. Proses penjadwalan timbul jika terdapat keterbatasan sumber daya yang dimiliki, karena pada saat ini perusahaan menerapkan sistem penjadwalan manual dimana dengan penjadwalan tersebut masih terdapat beberapa produk yang terlewati sehingga menyebabkan keterlambatan dalam proses produksi, aturan ini sering tidak menguntungkan bagi order yang membutuhkan waktu proses pendek karena apabila order itu berada dibelakang antrian maka harus menunggu lama sebelum diproses dan menyebabkan waktu penyelesaian seluruh order menjadi panjang, sehingga diperlukan adanya pengaturan sumber-sumber daya yang ada secara efisien. Adapun dasar perhitungan Penjadwalan dengan menggunakan algoritma Heuristic Pour. Tahapan-tahapan penelitian terdiri dari pengumpulan data, perhitungan waktu standar, perhitungan total waktu proses berdasarkan job, penjadwalan dengan metode awal perusahaan, penjadwalan dengan metode Heuristik Pour. Berdasarkan hasil penjadwalan menggunakan Heuristik Pour diperoleh penghematan dibanding dengan metode perusahaan saat ini, sehingga dapat digunakan sebagai alternatif metode dalam melakukan penjadwalan pengerjaan proses produksi di perusahaan Garment tersebut. Kata kunci: Penjadwalan Produksi, Algoritma, Heuristic Pour. Abstract Scheduling is a limited resource allocation activity to do a number of jobs. The scheduling process arises if there are limited resources available, because at this time the company implement a manual scheduling system where the scheduling is still there are some products passed so as to cause delays in the production process, this rule is often not profitable for orders that require short processing time because if the order is behind the queue then it must wait a long time before it is processed and cause the completion time of all orders to be long, so it is necessary to regulate the existing

  10. Estimating emissions from grout pouring operations

    International Nuclear Information System (INIS)

    Ballinger, M.Y.; Hendrickson, D.W.

    1993-08-01

    Grouting is a method for disposal of low-level radioactive waste in which a contaminated solution is mixed into a slurry, poured into a large storage vault, then dried, fixing the contaminants within a stable solid matrix. A model (RELEASE) has been developed to estimate the quantity of aeorsol created during the pouring process. Information and equations derived from spill experiments were used in the model to determine release fractions. This paper discusses the derivation of the release fraction equation used in the code and the model used to account for gravity settling of particles in the vault. The input and results for a base case application are shown

  11. Generic method for deriving the general shaking force balance conditions of parallel manipulators with application to a redundant planar 4-RRR parallel manipulator

    NARCIS (Netherlands)

    van der Wijk, V.; Krut, S.; Pierrot, F.; Herder, Justus Laurens

    2011-01-01

    This paper proposes a generic method for deriving the general shaking force balance conditions of parallel manipulators. Instead of considering the balancing of a parallel manipulator link-by-link or leg-by-leg, the architecture is considered altogether. The first step is to write the linear

  12. Medical Data Architecture Project Status

    Science.gov (United States)

    Krihak, M.; Middour, C.; Gurram, M.; Wolfe, S.; Marker, N.; Winther, S.; Ronzano, K.; Bolles, D.; Toscano, W.; Shaw, T.

    2018-01-01

    The Medical Data Architecture (MDA) project supports the Exploration Medical Capability (ExMC) risk to minimize or reduce the risk of adverse health outcomes and decrements in performance due to in-flight medical capabilities on human exploration missions. To mitigate this risk, the ExMC MDA project addresses the technical limitations identified in ExMC Gap Med 07: We do not have the capability to comprehensively process medically-relevant information to support medical operations during exploration missions. This gap identifies that the current in-flight medical data management includes a combination of data collection and distribution methods that are minimally integrated with on-board medical devices and systems. Furthermore, there are a variety of data sources and methods of data collection. For an exploration mission, the seamless management of such data will enable a more medically autonomous crew than the current paradigm. The medical system requirements are being developed in parallel with the exploration mission architecture and vehicle design. ExMC has recognized that in order to make informed decisions about a medical data architecture framework, current methods for medical data management must not only be understood, but an architecture must also be identified that provides the crew with actionable insight to medical conditions. This medical data architecture will provide the necessary functionality to address the challenges of executing a self-contained medical system that approaches crew health care delivery without assistance from ground support. Hence, the products supported by current prototype development will directly inform exploration medical system requirements.

  13. Parallel computation for distributed parameter system-from vector processors to Adena computer

    Energy Technology Data Exchange (ETDEWEB)

    Nogi, T

    1983-04-01

    Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.

  14. Domain Specific Language for Geant4 Parallelization for Space-based Applications, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — A major limiting factor in HPC growth is the requirement to parallelize codes to leverage emerging architectures, especially as single core performance has plateaued...

  15. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  16. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  17. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  18. A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map

    Directory of Open Access Journals (Sweden)

    Xizhong Wang

    2013-01-01

    Full Text Available We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM. The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI. The algorithm is suitable not only for multicore processors but also for the single-processor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.

  19. MINARET: Towards a time-dependent neutron transport parallel solver

    International Nuclear Information System (INIS)

    Baudron, A.M.; Lautard, J.J.; Maday, Y.; Mula, O.

    2013-01-01

    We present the newly developed time-dependent 3D multigroup discrete ordinates neutron transport solver that has recently been implemented in the MINARET code. The solver is the support for a study about computing acceleration techniques that involve parallel architectures. In this work, we will focus on the parallelization of two of the variables involved in our equation: the angular directions and the time. This last variable has been parallelized by a (time) domain decomposition method called the para-real in time algorithm. (authors)

  20. Parallel Execution of Multi Set Constraint Rewrite Rules

    DEFF Research Database (Denmark)

    Sulzmann, Martin; Lam, Edmund Soon Lee

    2008-01-01

    that the underlying constraint rewrite implementation executes rewrite steps in parallel on increasingly popular becoming multi-core architectures. We design and implement efficient algorithms which allow for the parallel execution of multi-set constraint rewrite rules. Our experiments show that we obtain some......Multi-set constraint rewriting allows for a highly parallel computational model and has been used in a multitude of application domains such as constraint solving, agent specification etc. Rewriting steps can be applied simultaneously as long as they do not interfere with each other.We wish...

  1. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  2. Accelerated radiotherapy planners calculated by parallelization with GPUs

    International Nuclear Information System (INIS)

    Reinado, D.; Cozar, J.; Alonso, S.; Chinillach, N.; Cortina, T.; Ricos, B.; Diez, S.

    2011-01-01

    In this paper we have developed and tested by a subroutine parallelization architectures graphics processing units (GPUs) to apply to calculations with standard algorithms known code. The experience acquired during these tests shall also apply to the MC calculations in radiotherapy if you have the code.

  3. Computation of watersheds based on parallel graph algorithms

    NARCIS (Netherlands)

    Meijster, A.; Roerdink, J.B.T.M.; Maragos, P; Schafer, RW; Butt, MA

    1996-01-01

    In this paper the implementation of a parallel watershed algorithm is described. The algorithm has been implemented on a Cray J932, which is a shared memory architecture with 32 processors. The watershed transform has generally been considered to be inherently sequential, but recently a few research

  4. A Parallel Algebraic Multigrid Solver on Graphics Processing Units

    KAUST Repository

    Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot

    2010-01-01

    -vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster

  5. A development framework for parallel CFD applications: TRIOU project

    International Nuclear Information System (INIS)

    Calvin, Ch.

    2003-01-01

    We present in this paper the parallel structure of a thermal-hydraulic framework: Trio-U. This development platform has been designed in order to solve large 3-dimensional structured or unstructured CFD (computational fluid dynamics) problems. The code is intrinsically parallel, and an object-oriented design, UML, is used. The implementation language chosen is C++. All the parallelism management and the communication routines have been encapsulated. Parallel I/O and communication classes over standard I/O streams of C++ have been defined, which allows the developer an easy use of the different modules of the application without dealing with basic parallel process management and communications. Moreover, the encapsulation of the communication routines, guarantees the portability of the application and allows an efficient tuning of basic communication methods in order to achieve the best performances of the target architecture. The speed-up of parallel applications designed using the Trio U framework are very good since we obtained, for instance, on complex turbulent flow Large Eddy Simulation (LES) simulations an efficiency of up to 90% on 20 processors. The efficiencies obtained on direct numerical simulations of two phase flow fluids are similar since the speed-up is nearly equals to 7.5 for a 3-dimensional simulation using a one million element mesh on 8 processors. The purpose of this paper is to focus on the main concepts and their implementation that were the guidelines of the design of the parallel architecture of the code. (author)

  6. De meilleurs emplois pour l'Asie

    International Development Research Centre (IDRC) Digital Library (Canada)

    Offrir de meilleurs emplois en Asie exigera des interventions créatives de la part des gouvernements, des employeurs et des entrepreneurs. Le CRDI aide les établisse- ments de recherche à trouver des .... de dollars en 2014, ont entraîné une expansion majeure des emplois pour les. Bangladaises. On s'attend à ce que ...

  7. The ongoing investigation of high performance parallel computing in HEP

    CERN Document Server

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  8. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    International Nuclear Information System (INIS)

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-01

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS

  9. Reliability allocation problem in a series-parallel system

    International Nuclear Information System (INIS)

    Yalaoui, Alice; Chu, Chengbin; Chatelet, Eric

    2005-01-01

    In order to improve system reliability, designers may introduce in a system different technologies in parallel. When each technology is composed of components in series, the configuration belongs to the series-parallel systems. This type of system has not been studied as much as the parallel-series architecture. There exist no methods dedicated to the reliability allocation in series-parallel systems with different technologies. We propose in this paper theoretical and practical results for the allocation problem in a series-parallel system. Two resolution approaches are developed. Firstly, a one stage problem is studied and the results are exploited for the multi-stages problem. A theoretical condition for obtaining the optimal allocation is developed. Since this condition is too restrictive, we secondly propose an alternative approach based on an approximated function and the results of the one-stage study. This second approach is applied to numerical examples

  10. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    Energy Technology Data Exchange (ETDEWEB)

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-04

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.

  11. Parallel execution of chemical software on EGEE Grid

    CERN Document Server

    Sterzel, Mariusz

    2008-01-01

    Constant interest among chemical community to study larger and larger molecules forces the parallelization of existing computational methods in chemistry and development of new ones. These are main reasons of frequent port updates and requests from the community for the Grid ports of new packages to satisfy their computational demands. Unfortunately some parallelization schemes used by chemical packages cannot be directly used in Grid environment. Here we present a solution for Gaussian package. The current state of development of Grid middleware allows easy parallel execution in case of software using any of MPI flavour. Unfortunately many chemical packages do not use MPI for parallelization therefore special treatment is needed. Gaussian can be executed in parallel on SMP architecture or via Linda. These require reservation of certain number of processors/cores on a given WN and the equal number of processors/cores on each WN, respectively. The current implementation of EGEE middleware does not offer such f...

  12. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  13. Parallelized Seeded Region Growing Using CUDA

    Directory of Open Access Journals (Sweden)

    Seongjin Park

    2014-01-01

    Full Text Available This paper presents a novel method for parallelizing the seeded region growing (SRG algorithm using Compute Unified Device Architecture (CUDA technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.

  14. Parallel interactive data analysis with PROOF

    International Nuclear Information System (INIS)

    Ballintijn, Maarten; Biskup, Marek; Brun, Rene; Canal, Philippe; Feichtinger, Derek; Ganis, Gerardo; Kickinger, Guenter; Peters, Andreas; Rademakers, Fons

    2006-01-01

    The Parallel ROOT Facility, PROOF, enables the analysis of much larger data sets on a shorter time scale. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. The system provides transparent and interactive access to gigabytes today. Being part of the ROOT framework PROOF inherits the benefits of a performant object storage system and a wealth of statistical and visualization tools. This paper describes the data analysis model of ROOT and the latest developments on closer integration of PROOF into that model and the ROOT user environment, e.g. support for PROOF-based browsing of trees stored remotely, and the popular TTree::Draw() interface. We also outline the ongoing developments aimed to improve the flexibility and user-friendliness of the system

  15. Parallelization of a blind deconvolution algorithm

    Science.gov (United States)

    Matson, Charles L.; Borelli, Kathy J.

    2006-09-01

    Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.

  16. A parallel robot to assist vitreoretinal surgery

    Energy Technology Data Exchange (ETDEWEB)

    Nakano, Taiga; Sugita, Naohiko; Mitsuishi, Mamoru [University of Tokyo, School of Engineering, Tokyo (Japan); Ueta, Takashi; Tamaki, Yasuhiro [University of Tokyo, Graduate School of Medicine, Tokyo (Japan)

    2009-11-15

    This paper describes the development and evaluation of a parallel prototype robot for vitreoretinal surgery where physiological hand tremor limits performance. The manipulator was specifically designed to meet requirements such as size, precision, and sterilization; this has six-degree-of-freedom parallel architecture and provides positioning accuracy with micrometer resolution within the eye. The manipulator is controlled by an operator with a ''master manipulator'' consisting of multiple joints. Results of the in vitro experiments revealed that when compared to the manual procedure, a higher stability and accuracy of tool positioning could be achieved using the prototype robot. This microsurgical system that we have developed has superior operability as compared to traditional manual procedure and has sufficient potential to be used clinically for vitreoretinal surgery. (orig.)

  17. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  18. Ultrasonographic Findings of Mammographic Architectural Distortion

    International Nuclear Information System (INIS)

    Ma, Jeong Hyun; Kang, Bong Joo; Cha, Eun Suk; Hwangbo, Seol; Kim, Hyeon Sook; Park, Chang Suk; Kim, Sung Hun; Choi, Jae Jeong; Chung, Yong An

    2008-01-01

    To review the sonographic findings of various diseases showing architectural distortion depicted under mammography. We collected and reviewed architectural distortions observed under mammography at our health institution between 1 March 2004, and 28 February 2007. We collected 23 cases of sonographically-detected mammographic architectural distortions that confirmed lesions after surgical resection. The sonographic findings of mammographic architectural distortion were analyzed by use of the BI-RADS lexicon for shape, margin, lesion boundary, echo pattern, posterior acoustic feature and orientation. There were variable diseases that showed architectural distortion depicted under mammography. Fibrocystic disease was the most common presentation (n = 6), followed by adenosis (n = 2), stromal fibrosis (n = 2), radial scar (n = 3), usual ductal hyperplasia (n = 1), atypical ductal hyperplasia (n = 1) and mild fibrosis with microcalcification (n = 1). Malignant lesions such as ductal carcinoma in situ (DCIS) (n = 2), lobular carcinoma in situ (LCIS) (n = 2), invasive ductal carcinoma (n = 2) and invasive lobular carcinoma (n = 1) were observed. As observed by sonography, shape was divided as irregular (n = 22) and round (n = 1). Margin was divided as circumscribed (n = 1), indistinct (n = 7), angular (n = 1), microlobulated (n = 1) and sipculated (n = 13). Lesion boundary was divided as abrupt interface (n = 11) and echogenic halo (n = 12). Echo pattern was divided as hypoechoic (n = 20), anechoic (n = 1), hyperechoic (n = 1) and isoechoic (n = 1). Posterior acoustic feature was divided as posterior acoustic feature (n = 7), posterior acoustic shadow (n = 15) and complex posterior acoustic feature (n = 1). Orientation was divided as parallel (n = 12) and not parallel (n = 11). There were no differential sonographic findings between benign and malignant lesions. This study presented various sonographic findings of mammographic architectural distortion and that it is

  19. Fluid dynamics parallel computer development at NASA Langley Research Center

    Science.gov (United States)

    Townsend, James C.; Zang, Thomas A.; Dwoyer, Douglas L.

    1987-01-01

    To accomplish more detailed simulations of highly complex flows, such as the transition to turbulence, fluid dynamics research requires computers much more powerful than any available today. Only parallel processing on multiple-processor computers offers hope for achieving the required effective speeds. Looking ahead to the use of these machines, the fluid dynamicist faces three issues: algorithm development for near-term parallel computers, architecture development for future computer power increases, and assessment of possible advantages of special purpose designs. Two projects at NASA Langley address these issues. Software development and algorithm exploration is being done on the FLEX/32 Parallel Processing Research Computer. New architecture features are being explored in the special purpose hardware design of the Navier-Stokes Computer. These projects are complementary and are producing promising results.

  20. Modeling Architectural Patterns’ Behavior Using Architectural Primitives

    NARCIS (Netherlands)

    Waqas Kamal, Ahmad; Avgeriou, Paris

    2008-01-01

    Architectural patterns have an impact on both the structure and the behavior of a system at the architecture design level. However, it is challenging to model patterns’ behavior in a systematic way because modeling languages do not provide the appropriate abstractions and because each pattern

  1. Parallel algorithms on the ASTRA SIMD machine

    International Nuclear Information System (INIS)

    Odor, G.; Rohrbach, F.; Vesztergombi, G.; Varga, G.; Tatrai, F.

    1996-01-01

    In view of the tremendous computing power jump of modern RISC processors the interest in parallel computing seems to be thinning out. Why use a complicated system of parallel processors, if the problem can be solved by a single powerful micro-chip. It is a general law, however, that exponential growth will always end by some kind of a saturation, and then parallelism will again become a hot topic. We try to prepare ourselves for this eventuality. The MPPC project started in 1990 in the keydeys of parallelism and produced four ASTRA machines (presented at CHEP's 92) with 4k processors (which are expandable to 16k) based on yesterday's chip-technology (chip presented at CHEP'91). These machines now provide excellent test-beds for algorithmic developments in a complete, real environment. We are developing for example fast-pattern recognition algorithms which could be used in high-energy physics experiments at the LHC (planned to be operational after 2004 at CERN) for triggering and data reduction. The basic feature of our ASP (Associate String Processor) approach is to use extremely simple (thus very cheap) processor elements but in huge quantities (up to millions of processors) connected together by a very simple string-like communication chain. In this paper we present powerful algorithms based on this architecture indicating the performance perspectives if the hardware quality reaches present or even future technology levels. (author)

  2. Scalable parallel prefix solvers for discrete ordinates transport

    International Nuclear Information System (INIS)

    Pautz, S.; Pandya, T.; Adams, M.

    2009-01-01

    The well-known 'sweep' algorithm for inverting the streaming-plus-collision term in first-order deterministic radiation transport calculations has some desirable numerical properties. However, it suffers from parallel scaling issues caused by a lack of concurrency. The maximum degree of concurrency, and thus the maximum parallelism, grows more slowly than the problem size for sweeps-based solvers. We investigate a new class of parallel algorithms that involves recasting the streaming-plus-collision problem in prefix form and solving via cyclic reduction. This method, although computationally more expensive at low levels of parallelism than the sweep algorithm, offers better theoretical scalability properties. Previous work has demonstrated this approach for one-dimensional calculations; we show how to extend it to multidimensional calculations. Notably, for multiple dimensions it appears that this approach is limited to long-characteristics discretizations; other discretizations cannot be cast in prefix form. We implement two variants of the algorithm within the radlib/SCEPTRE transport code library at Sandia National Laboratories and show results on two different massively parallel systems. Both the 'forward' and 'symmetric' solvers behave similarly, scaling well to larger degrees of parallelism then sweeps-based solvers. We do observe some issues at the highest levels of parallelism (relative to the system size) and discuss possible causes. We conclude that this approach shows good potential for future parallel systems, but the parallel scalability will depend heavily on the architecture of the communication networks of these systems. (authors)

  3. Religious architecture: anthropological perspectives

    NARCIS (Netherlands)

    Verkaaik, O.

    2013-01-01

    Religious Architecture: Anthropological Perspectives develops an anthropological perspective on modern religious architecture, including mosques, churches and synagogues. Borrowing from a range of theoretical perspectives on space-making and material religion, this volume looks at how religious

  4. Avionics Architecture for Exploration

    Data.gov (United States)

    National Aeronautics and Space Administration — The goal of the AES Avionics Architectures for Exploration (AAE) project is to develop a reference architecture that is based on standards and that can be scaled and...

  5. RATS: Reactive Architectures

    National Research Council Canada - National Science Library

    Christensen, Marc

    2004-01-01

    This project had two goals: To build an emulation prototype board for a tiled architecture and to demonstrate the utility of a global inter-chip free-space photonic interconnection fabric for polymorphous computer architectures (PCA...

  6. Rhein-Ruhr architecture

    DEFF Research Database (Denmark)

    2002-01-01

    katalog til udstillingen 'Rhein - Ruhr architecture' Meldahls smedie, 15. marts - 28. april 2002. 99 sider......katalog til udstillingen 'Rhein - Ruhr architecture' Meldahls smedie, 15. marts - 28. april 2002. 99 sider...

  7. SPINning parallel systems software

    International Nuclear Information System (INIS)

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-01-01

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

  8. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  9. Architecture and Film

    OpenAIRE

    Mohammad Javaheri, Saharnaz

    2016-01-01

    Film does not exist without architecture. In every movie that has ever been made throughout history, the cinematic image of architecture is embedded within the picture. Throughout my studies and research, I began to see that there is no director who can consciously or unconsciously deny the use of architectural elements in his or her movies. Architecture offers a strong profile to distinguish characters and story. In the early days, films were shot in streets surrounde...

  10. Elements of Architecture

    DEFF Research Database (Denmark)

    Elements of Architecture explores new ways of engaging architecture in archaeology. It conceives of architecture both as the physical evidence of past societies and as existing beyond the physical environment, considering how people in the past have not just dwelled in buildings but have existed...

  11. Modular architectures for quantum networks

    Science.gov (United States)

    Pirker, A.; Wallnöfer, J.; Dür, W.

    2018-05-01

    We consider the problem of generating multipartite entangled states in a quantum network upon request. We follow a top-down approach, where the required entanglement is initially present in the network in form of network states shared between network devices, and then manipulated in such a way that the desired target state is generated. This minimizes generation times, and allows for network structures that are in principle independent of physical links. We present a modular and flexible architecture, where a multi-layer network consists of devices of varying complexity, including quantum network routers, switches and clients, that share certain resource states. We concentrate on the generation of graph states among clients, which are resources for numerous distributed quantum tasks. We assume minimal functionality for clients, i.e. they do not participate in the complex and distributed generation process of the target state. We present architectures based on shared multipartite entangled Greenberger–Horne–Zeilinger states of different size, and fully connected decorated graph states, respectively. We compare the features of these architectures to an approach that is based on bipartite entanglement, and identify advantages of the multipartite approach in terms of memory requirements and complexity of state manipulation. The architectures can handle parallel requests, and are designed in such a way that the network state can be dynamically extended if new clients or devices join the network. For generation or dynamical extension of the network states, we propose a quantum network configuration protocol, where entanglement purification is used to establish high fidelity states. The latter also allows one to show that the entanglement generated among clients is private, i.e. the network is secure.

  12. Cross-Circulating Current Suppression Method for Parallel Three-Phase Two-Level Inverters

    DEFF Research Database (Denmark)

    Wei, Baoze; Guerrero, Josep M.; Guo, Xiaoqiang

    2015-01-01

    The parallel architecture is very popular for power inverters to increase the power level. This paper presents a method for the parallel operation of inverters in an ac-distributed system, to suppress the cross-circulating current based on virtual impedance without current-sharing bus...

  13. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  14. A parallel solution for high resolution histological image analysis.

    Science.gov (United States)

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  15. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  16. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  17. Parallel Fast Legendre Transform

    NARCIS (Netherlands)

    Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.

    1998-01-01

    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were

  18. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  19. Parallel universes beguile science

    CERN Multimedia

    2007-01-01

    A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.

  20. Parallel plate detectors

    International Nuclear Information System (INIS)

    Gardes, D.; Volkov, P.

    1981-01-01

    A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr

  1. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  2. Vital architecture, slow momentum policy

    DEFF Research Database (Denmark)

    Braae, Ellen Marie

    2010-01-01

    A reflection on the relation between Danish landscape architecture policy and the statements made through current landscape architectural project.......A reflection on the relation between Danish landscape architecture policy and the statements made through current landscape architectural project....

  3. Parallelization and automatic data distribution for nuclear reactor simulations

    Energy Technology Data Exchange (ETDEWEB)

    Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  4. Parallelization and automatic data distribution for nuclear reactor simulations

    International Nuclear Information System (INIS)

    Liebrock, L.M.

    1997-01-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed

  5. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects

    International Nuclear Information System (INIS)

    Agullo, Emmanuel; Demmel, Jim; Dongarra, Jack; Hadri, Bilel; Kurzak, Jakub; Langou, Julien; Ltaief, Hatem; Luszczek, Piotr; Tomov, Stanimire

    2009-01-01

    The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics Architectures (MAGMA) are two projects that aims to achieve high performance and portability across a wide range of multi-core architectures and hybrid systems respectively. We present in this document a comparative study of PLASMA's performance against established linear algebra packages and some preliminary results of MAGMA on hybrid multi-core and GPU systems.

  6. QDP++: Data Parallel Interface for QCD

    Energy Technology Data Exchange (ETDEWEB)

    Robert Edwards

    2003-03-01

    This is a user's guide for the C++ binding for the QDP Data Parallel Applications Programmer Interface developed under the auspices of the US Department of Energy Scientific Discovery through Advanced Computing (SciDAC) program. The QDP Level 2 API has the following features: (1) Provides data parallel operations (logically SIMD) on all sites across the lattice or subsets of these sites. (2) Operates on lattice objects, which have an implementation-dependent data layout that is not visible above this API. (3) Hides details of how the implementation maps onto a given architecture, namely how the logical problem grid (i.el lattice) is mapped onto the machine architecture. (4) Allows asynchronous (non-blocking) shifts of lattice level objects over any permutation map of site sonto sites. However, from the user's view these instructions appear blocking and in fact may be so in some implementation. (5) Provides broadcast operations (filling a lattice quantity from a scalar value(s)), global reduction operations, and lattice-wide operations on various data-type primitives, such as matrices, vectors, and tensor products of matrices (propagators). (6) Operator syntax that support complex expression constructions.

  7. Parallel asynchronous systems and image processing algorithms

    Science.gov (United States)

    Coon, D. D.; Perera, A. G. U.

    1989-01-01

    A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.

  8. Pitié pour les grandes villes !

    Directory of Open Access Journals (Sweden)

    Jérôme Monnet

    1997-02-01

    Full Text Available Roger Caillois disait, en 1938, qu´il existe "une représentation de la grande ville, assez puissante sur les imaginations pour que jamais en pratique ne soit posée la question de son exactitude, créée de toute pièce par le livre, assez répandue néanmoins pour faire partie de l´atmosphère mentale collective et posséder par suite une certaine force de contrainte"(Le mythe et l´homme, p.156 [c´est lui qui souligne]. En 1996, la presse française a consacré dossiers et articles à "Habitat II...

  9. Plutonium Immobilization Program cold pour tests

    International Nuclear Information System (INIS)

    Hovis, G.L.; Stokes, M.W.; Smith, M.E.; Wong, J.W.

    1999-01-01

    The Plutonium Immobilization Program (PIP) is a joint venture between the Savannah River Site, Lawrence Livermore National Laboratory, Argonne National Laboratory, and Pacific Northwest National Laboratory to carry out the disposition of excess weapons-grade plutonium. This program uses the can-in-canister (CIC) approach. CIC involves encapsulating plutonium in ceramic forms (or pucks), placing the pucks in sealed stainless steel cans, placing the cans in long cylindrical magazines, latching the magazines to racks inside Defense Waste Processing Facility (DWPF) canisters, and filling the DWPF canisters with high-level waste glass. This process puts the plutonium in a stable form and makes it attractive for reuse. At present, the DWPF pours glass into empty canisters. In the CIC approach, the addition of a stainless steel rack, magazines, cans, and ceramic pucks to the canisters introduces a new set of design and operational challenges: All of the hardware installed in the canisters must maintain structural integrity at elevated (molten-glass) temperatures. This suggests that a robust design is needed. However, the amount of material added to the DWPF canister must be minimized to prevent premature glass cooling and excessive voiding caused by a large internal thermal mass. High metal temperatures, minimizing thermal mass, and glass flow paths are examples of the types of technical considerations of the equipment design process. To determine the effectiveness of the design in terms of structural integrity and glass-flow characteristics, full-scale testing will be conducted. A cold (nonradioactive) pour test program is planned to assist in the development and verification of a baseline design for the immobilization canister to be used in the PIP process. The baseline design resulting from the cold pour test program and CIC equipment development program will provide input to Title 1 design for second-stage immobilization. The cold pour tests will be conducted in two

  10. Exporting Humanist Architecture

    DEFF Research Database (Denmark)

    Nielsen, Tom

    2016-01-01

    The article is a chapter in the catalogue for the Danish exhibition at the 2016 Architecture Biennale in Venice. The catalogue is conceived at an independent book exploring the theme Art of Many - The Right to Space. The chapter is an essay in this anthology tracing and discussing the different...... values and ethical stands involved in the export of Danish Architecture. Abstract: Danish architecture has, in a sense, been driven by an unwritten contract between the architects and the democratic state and its institutions. This contract may be viewed as an ethos – an architectural tradition...... with inherent aesthetic and moral values. Today, however, Danish architecture is also an export commodity. That raises questions, which should be debated as openly as possible. What does it mean for architecture and architects to practice in cultures and under political systems that do not use architecture...

  11. Software architecture evolution

    DEFF Research Database (Denmark)

    Barais, Olivier; Le Meur, Anne-Francoise; Duchien, Laurence

    2008-01-01

    Software architectures must frequently evolve to cope with changing requirements, and this evolution often implies integrating new concerns. Unfortunately, when the new concerns are crosscutting, existing architecture description languages provide little or no support for this kind of evolution....... The software architect must modify multiple elements of the architecture manually, which risks introducing inconsistencies. This chapter provides an overview, comparison and detailed treatment of the various state-of-the-art approaches to describing and evolving software architectures. Furthermore, we discuss...... one particular framework named Tran SAT, which addresses the above problems of software architecture evolution. Tran SAT provides a new element in the software architecture descriptions language, called an architectural aspect, for describing new concerns and their integration into an existing...

  12. Parallel protein secondary structure prediction based on neural networks.

    Science.gov (United States)

    Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

    2004-01-01

    Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

  13. A Statistical Treatment of Bioassay Pour Fractions

    Science.gov (United States)

    Barengoltz, Jack; Hughes, David W.

    2014-01-01

    The binomial probability distribution is used to treat the statistics of a microbiological sample that is split into two parts, with only one part evaluated for spore count. One wishes to estimate the total number of spores in the sample based on the counts obtained from the part that is evaluated (pour fraction). Formally, the binomial distribution is recharacterized as a function of the observed counts (successes), with the total number (trials) an unknown. The pour fraction is the probability of success per spore (trial). This distribution must be renormalized in terms of the total number. Finally, the new renormalized distribution is integrated and mathematically inverted to yield the maximum estimate of the total number as a function of a desired level of confidence ( P(fraction. The extension to recovery efficiency corrections is also presented. Now the product of recovery efficiency and pour fraction may be small enough that the likely value may be much larger than the usual calculation: the number of spores divided by that product. The use of this analysis would not be limited to microbiological data.

  14. Comparison of multihardware parallel implementations for a phase unwrapping algorithm

    Science.gov (United States)

    Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

    2018-04-01

    Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.

  15. Performance modeling of parallel algorithms for solving neutron diffusion problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1995-01-01

    Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

  16. Physics Detector Simulation Facility (PDSF) architecture/utilization

    International Nuclear Information System (INIS)

    Scipioni, B.

    1993-05-01

    The current systems architecture for the SSCL's Physics Detector Simulation Facility (PDSF) is presented. Systems analysis data is presented and discussed. In particular, these data disclose the effectiveness of utilization of the facility for meeting the needs of physics computing, especially as concerns parallel architecture and processing. Detailed design plans for the highly networked, symmetric, parallel, UNIX workstation-based facility are given and discussed in light of the design philosophy. Included are network, CPU, disk, router, concentrator, tape, user and job capacities and throughput

  17. Myanmar : tous les projets | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Programme: Governance and Justice ... La création de zones économiques frontalières constitue une importante stratégie d'industrialisation pour la Thaïlande et ouvre de nouvelles perspectives pour deux ... Una Hakika : Porter à grande échelle les solutions numériques pour la gestion des conflits au Kenya et en Birmanie.

  18. Inde | Page 77 | CRDI - Centre de recherches pour le ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Ce n'est un secret pour personne que les produits agricoles traditionnels comme les mils et les légumineuses à graines sont très nutritifs. C'est pourquoi des chercheurs collaborent actuellement avec des femmes en Inde et en Éthiopie pour faciliter l'utilisation à des fins personnelles (pour la préparation de repas sains) et ...

  19. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  20. More parallel please

    DEFF Research Database (Denmark)

    Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert

    of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...

  1. PARALLEL MOVING MECHANICAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Florian Ion Tiberius Petrescu

    2014-09-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.

  2. Construction of a digital elevation model: methods and parallelization

    International Nuclear Information System (INIS)

    Mazzoni, Christophe

    1995-01-01

    The aim of this work is to reduce the computation time needed to produce the Digital Elevation Models (DEM) by using a parallel machine. It is made in collaboration between the French 'Institut Geographique National' (IGN) and the Laboratoire d'Electronique de Technologie et d'Instrumentation (LETI) of the French Atomic Energy Commission (CEA). The IGN has developed a system which provides DEM that is used to produce topographic maps. The kernel of this system is the correlator, a software which automatically matches pairs of homologous points of a stereo-pair of photographs. Nevertheless the correlator is expensive In computing time. In order to reduce computation time and to produce the DEM with same accuracy that the actual system, we have parallelized the IGN's correlator on the OPENVISION system. This hardware solution uses a SIMD (Single Instruction Multiple Data) parallel machine SYMPATI-2, developed by the LETI that is involved in parallel architecture and image processing. Our analysis of the implementation has demonstrated the difficulty of efficient coupling between scalar and parallel structure. So we propose solutions to reinforce this coupling. In order to accelerate more the processing we evaluate SYMPHONIE, a SIMD calculator, successor of SYMPATI-2. On an other hand, we developed a multi-agent approach for what a MIMD (Multiple Instruction, Multiple Data) architecture is available. At last, we describe a Multi-SIMD architecture that conciliates our two approaches. This architecture offers a capacity to apprehend efficiently multi-level treatment image. It is flexible by its modularity, and its communication network supplies reliability that interest sensible systems. (author) [fr

  3. Xyce parallel electronic simulator.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  4. Stability of parallel flows

    CERN Document Server

    Betchov, R

    2012-01-01

    Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation

  5. Evolutionary design assistants for architecture

    Directory of Open Access Journals (Sweden)

    N. Onur Sönmez

    2015-04-01

    Full Text Available In its parallel pursuit of an increased competitivity for design offices and more pleasurable and easier workflows for designers, artificial design intelligence is a technical, intellectual, and political challenge. While human-machine cooperation has become commonplace through Computer Aided Design (CAD tools, a more improved collaboration and better support appear possible only through an endeavor into a kind of artificial design intelligence, which is more sensitive to the human perception of affairs. Considered as part of the broader Computational Design studies, the research program of this quest can be called Artificial / Autonomous / Automated Design (AD. The current available level of Artificial Intelligence (AI for design is limited and a viable aim for current AD would be to develop design assistants that are capable of producing drafts for various design tasks. Thus, the overall aim of this thesis is the development of approaches, techniques, and tools towards artificial design assistants that offer a capability for generating drafts for sub-tasks within design processes. The main technology explored for this aim is Evolutionary Computation (EC, and the target design domain is architecture. The two connected research questions of the study concern, first, the investigation of the ways to develop an architectural design assistant, and secondly, the utilization of EC for the development of such assistants. While developing approaches, techniques, and computational tools for such an assistant, the study also carries out a broad theoretical investigation into the main problems, challenges, and requirements towards such assistants on a rather overall level. Therefore, the research is shaped as a parallel investigation of three main threads interwoven along several levels, moving from a more general level to specific applications. The three research threads comprise, first, theoretical discussions and speculations with regard to both

  6. Reducing Concurrency Bottlenecks in Parallel I/O Workloads

    Energy Technology Data Exchange (ETDEWEB)

    Manzanares, Adam C. [Los Alamos National Laboratory; Bent, John M. [Los Alamos National Laboratory; Wingate, Meghan [Los Alamos National Laboratory

    2011-01-01

    To enable high performance parallel checkpointing we introduced the Parallel Log Structured File System (PLFS). PLFS is middleware interposed on the file system stack to transform concurrent writing of one application file into many non-concurrently written component files. The promising effectiveness of PLFS makes it important to examine its performance for workloads other than checkpoint capture, notably the different ways that state snapshots may be later read, to make the case for using PLFS in the Exascale I/O stack. Reading a PLFS file involved reading each of its component files. In this paper we identify performance limitations on broader workloads in an early version of PLFS, specifically the need to build and distribute an index for the overall file, and the pressure on the underlying parallel file system's metadata server, and show how PLFS's decomposed components architecture can be exploited to alleviate bottlenecks in the underlying parallel file system.

  7. The effect of pouring time on the dimensional stability of casts made from conventional and extended-pour irreversible hydrocolloids by 3D modelling

    Directory of Open Access Journals (Sweden)

    Hasan Ö. Gümüş

    2015-09-01

    Conclusion: All of the conventional and extended-pour impression materials tested in this study can be poured up to 24 hours with accuracy, if impressions are correctly stored. Extended-pour impression materials (ColorChange, Hydrogum 5, and Hydrocolor 5 can be poured up to 120 hours, if stored correctly.

  8. Parallel Monte Carlo Search for Hough Transform

    Science.gov (United States)

    Lopes, Raul H. C.; Franqueira, Virginia N. L.; Reid, Ivan D.; Hobson, Peter R.

    2017-10-01

    We investigate the problem of line detection in digital image processing and in special how state of the art algorithms behave in the presence of noise and whether CPU efficiency can be improved by the combination of a Monte Carlo Tree Search, hierarchical space decomposition, and parallel computing. The starting point of the investigation is the method introduced in 1962 by Paul Hough for detecting lines in binary images. Extended in the 1970s to the detection of space forms, what came to be known as Hough Transform (HT) has been proposed, for example, in the context of track fitting in the LHC ATLAS and CMS projects. The Hough Transform transfers the problem of line detection, for example, into one of optimization of the peak in a vote counting process for cells which contain the possible points of candidate lines. The detection algorithm can be computationally expensive both in the demands made upon the processor and on memory. Additionally, it can have a reduced effectiveness in detection in the presence of noise. Our first contribution consists in an evaluation of the use of a variation of the Radon Transform as a form of improving theeffectiveness of line detection in the presence of noise. Then, parallel algorithms for variations of the Hough Transform and the Radon Transform for line detection are introduced. An algorithm for Parallel Monte Carlo Search applied to line detection is also introduced. Their algorithmic complexities are discussed. Finally, implementations on multi-GPU and multicore architectures are discussed.

  9. Fast image processing on parallel hardware

    International Nuclear Information System (INIS)

    Bittner, U.

    1988-01-01

    Current digital imaging modalities in the medical field incorporate parallel hardware which is heavily used in the stage of image formation like the CT/MR image reconstruction or in the DSA real time subtraction. In order to image post-processing as efficient as image acquisition, new software approaches have to be found which take full advantage of the parallel hardware architecture. This paper describes the implementation of two-dimensional median filter which can serve as an example for the development of such an algorithm. The algorithm is analyzed by viewing it as a complete parallel sort of the k pixel values in the chosen window which leads to a generalization to rank order operators and other closely related filters reported in literature. A section about the theoretical base of the algorithm gives hints for how to characterize operations suitable for implementations on pipeline processors and the way to find the appropriate algorithms. Finally some results that computation time and usefulness of medial filtering in radiographic imaging are given

  10. Xyce parallel electronic simulator : reference guide.

    Energy Technology Data Exchange (ETDEWEB)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  11. Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

    Directory of Open Access Journals (Sweden)

    Harald Michalik

    2009-02-01

    Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.

  12. Enterprise architecture management

    DEFF Research Database (Denmark)

    Rahimi, Fatemeh; Gøtze, John; Møller, Charles

    2017-01-01

    Despite the growing interest in enterprise architecture management, researchers and practitioners lack a shared understanding of its applications in organizations. Building on findings from a literature review and eight case studies, we develop a taxonomy that categorizes applications of enterprise...... architecture management based on three classes of enterprise architecture scope. Organizations may adopt enterprise architecture management to help form, plan, and implement IT strategies; help plan and implement business strategies; or to further complement the business strategy-formation process....... The findings challenge the traditional IT-centric view of enterprise architecture management application and suggest enterprise architecture management as an approach that could support the consistent design and evolution of an organization as a whole....

  13. Can You Hear Architecture

    DEFF Research Database (Denmark)

    Ryhl, Camilla

    2016-01-01

    Taking an off set in the understanding of architectural quality being based on multisensory architecture, the paper aims to discuss the current acoustic discourse in inclusive design and its implications to the integration of inclusive design in architectural discourse and practice as well...... as the understanding of user needs. The paper further points to the need to elaborate and nuance the discourse much more, in order to assure inclusion to the many users living with a hearing impairment or, for other reasons, with a high degree of auditory sensitivity. Using the authors’ own research on inclusive...... design and architectural quality for people with a hearing disability and a newly conducted qualitative evaluation research in Denmark as well as architectural theories on multisensory aspects of architectural experiences, the paper uses examples of existing Nordic building cases to discuss the role...

  14. Enterprise architecture management

    DEFF Research Database (Denmark)

    Rahimi, Fatemeh; Gøtze, John; Møller, Charles

    2017-01-01

    architecture management based on three classes of enterprise architecture scope. Organizations may adopt enterprise architecture management to help form, plan, and implement IT strategies; help plan and implement business strategies; or to further complement the business strategy-formation process......Despite the growing interest in enterprise architecture management, researchers and practitioners lack a shared understanding of its applications in organizations. Building on findings from a literature review and eight case studies, we develop a taxonomy that categorizes applications of enterprise....... The findings challenge the traditional IT-centric view of enterprise architecture management application and suggest enterprise architecture management as an approach that could support the consistent design and evolution of an organization as a whole....

  15. PLAST: parallel local alignment search tool for database comparison

    Directory of Open Access Journals (Sweden)

    Lavenier Dominique

    2009-10-01

    Full Text Available Abstract Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set and the multithreading concept (multicore. Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

  16. Scientific programming on massively parallel processor CP-PACS

    International Nuclear Information System (INIS)

    Boku, Taisuke

    1998-01-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  17. An Architecture of Reconciliation

    OpenAIRE

    Bolton, Carlton Robert

    2001-01-01

    The reconciliation of architectural idea and built form is accomplished by the materialization of the idea through the use of specific materials with their inherent qualities and restrictions. The learning begins when one sees these restrictions not as a hinderance to the idea, but that which can reveal the very essence of Architecture. The virtue of this architecture of reconciliation lies in its ability to help Man understand his surroundings and place in the world at large. This is acc...

  18. Flexible weapons architecture design

    OpenAIRE

    Pyant, William C.

    2015-01-01

    Present day air-delivered weapons are of a closed architecture, with little to no ability to tailor the weapon for the individual engagement. The closed architectures require weaponeers to make the target fit the weapon instead of fitting the individual weapons to a target. The concept of a flexible weapons aims to modularize weapons design using an open architecture shell into which different modules are inserted to achieve the desired target fractional damage while reducing cost and civilia...

  19. Architecture humanitarian emergencies

    DEFF Research Database (Denmark)

    Gomez-Guillamon, Maria; Eskemose Andersen, Jørgen; Contreras, Jorge Lobos

    2013-01-01

    Introduced by scientific articles conserning architecture and human rights in light of cultures, emergencies, social equality and sustainability, democracy, economy, artistic development and science into architecture. Concluding in definition of needs for new roles, processes and education of arc......, Architettura di Alghero in Italy, Architecture and Design of Kocaeli University in Turkey, University of Aguascalientes in Mexico, Architectura y Urbanismo of University of Chile and Escuela de Architectura of Universidad Austral in Chile....

  20. Architecture in Everyday Life

    OpenAIRE

    Costa Agarez, Ricardo

    2015-01-01

    For most architects, architecture is not only art, craft, passion and engagement; it is their ‘bread-and-butter’, too, and has been so since long. Architecture, consciously or unconsciously, is also the ‘bread-and-butter’ of communities across the world: successfully or unsuccessfully it is part of the daily lives of ordinary women and men. Yet practitioners, theoreticians and historians of architecture often disregard the more quotidian side of the discipline, a neglect that is inversely pro...

  1. The ATLAS Analysis Architecture

    International Nuclear Information System (INIS)

    Cranmer, K.S.

    2008-01-01

    We present an overview of the ATLAS analysis architecture including the relevant aspects of the computing model and the major architectural aspects of the Athena framework. Emphasis will be given to the interplay between the analysis use cases and the technical aspects of the architecture including the design of the event data model, transient-persistent separation, data reduction strategies, analysis tools, and ROOT interoperability

  2. Architecture for Data Management

    OpenAIRE

    Vukolic, Marko

    2015-01-01

    In this document we present the preliminary architecture of the SUPERCLOUD data management and storage. We start by defining the design requirements of the architecture, motivated by use cases and then review the state-of-the-art. We survey security and dependability technologies and discuss designs for the overall unifying architecture for data management that serves as an umbrella for different security and dependability data management features. Specifically the document lays out the archi...

  3. Real-time FPGA architectures for computer vision

    Science.gov (United States)

    Arias-Estrada, Miguel; Torres-Huitzil, Cesar

    2000-03-01

    This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low level image processing. The FPGA-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on a dedicated VLSI to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real time performance are discussed. Some results are presented and discussed.

  4. Options for Parallelizing a Planning and Scheduling Algorithm

    Science.gov (United States)

    Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

    2011-01-01

    Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.

  5. Badlands: A parallel basin and landscape dynamics model

    Directory of Open Access Journals (Sweden)

    T. Salles

    2016-01-01

    Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.

  6. Heterogeneous Multicore Parallel Programming for Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Francois Bodin

    2009-01-01

    Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

  7. A Parallel Algebraic Multigrid Solver on Graphics Processing Units

    KAUST Repository

    Haase, Gundolf

    2010-01-01

    The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.

  8. Architecture and Stages

    DEFF Research Database (Denmark)

    Kiib, Hans

    2009-01-01

    as "experiencescape" - a space between tourism, culture, learning and economy. Strategies related to these challenges involve new architectural concepts and art as ‘engines' for a change. New expressive architecture and old industrial buildings are often combined into hybrid narratives, linking the past...... with the future. But this is not enough. The agenda is to develop architectural spaces, where social interaction and learning are enhanced by art and fun. How can we develop new architectural designs in our inner cities and waterfronts where eventscapes, learning labs and temporal use are merged with everyday...

  9. Grid Architecture 2

    Energy Technology Data Exchange (ETDEWEB)

    Taft, Jeffrey D. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2016-01-01

    The report describes work done on Grid Architecture under the auspices of the Department of Electricity Office of Electricity Delivery and Reliability in 2015. As described in the first Grid Architecture report, the primary purpose of this work is to provide stakeholder insight about grid issues so as to enable superior decision making on their part. Doing this requires the creation of various work products, including oft-times complex diagrams, analyses, and explanations. This report provides architectural insights into several important grid topics and also describes work done to advance the science of Grid Architecture as well.

  10. Towards a Media Architecture

    DEFF Research Database (Denmark)

    Ebsen, Tobias

    2010-01-01

    This text explores the concept of media architecture as a phenomenon of visual culture that describes the use of screen-technology in new spatial configurations in practices of architecture and art. I shall argue that this phenomenon is not necessarily a revolutionary new approach, but rather...... a result of conceptual changes in both modes visual representation and in expressions of architecture. These are changes the may be described as an evolution of ideas and consequent experiments that can be traced back to changes in the history of art and the various styles and ideologies of architecture....

  11. Decentralized Software Architecture

    National Research Council Canada - National Science Library

    Khare, Rohit

    2002-01-01

    .... While the term "decentralization" is familiar from political and economic contexts, it has been applied extensively, if indiscriminately, to describe recent trends in software architecture towards...

  12. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  13. SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

    Directory of Open Access Journals (Sweden)

    M. K. Bouza

    2017-01-01

    Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.

  14. The architecture of a video image processor for the space station

    Science.gov (United States)

    Yalamanchili, S.; Lee, D.; Fritze, K.; Carpenter, T.; Hoyme, K.; Murray, N.

    1987-01-01

    The architecture of a video image processor for space station applications is described. The architecture was derived from a study of the requirements of algorithms that are necessary to produce the desired functionality of many of these applications. Architectural options were selected based on a simulation of the execution of these algorithms on various architectural organizations. A great deal of emphasis was placed on the ability of the system to evolve and grow over the lifetime of the space station. The result is a hierarchical parallel architecture that is characterized by high level language programmability, modularity, extensibility and can meet the required performance goals.

  15. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  16. An Automatic Instruction-Level Parallelization of Machine Code

    Directory of Open Access Journals (Sweden)

    MARINKOVIC, V.

    2018-02-01

    Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.

  17. Performance study of a cluster calculation; parallelization and application under geant4

    International Nuclear Information System (INIS)

    Trabelsi, Abir

    2007-01-01

    This work concretizes the final studies project for engineering computer sciences, it is archived within the national center of nuclear sciences and technology. The project consists in studying the performance of a set of machines in order to determine the best architecture to assemble them in a cluster. As well as the parallelism and the parallel implementation of GEANT4, as a tool of simulation. The realisation of this project consists on : 1) programming with C++ and executing the two benchmarks P MV and PMM on each station; 2) Interpreting this result in order to show the best architecture of the cluster; 3) parallelism with TOP-C the two benchmarks; 4) Executing the two Top-C versions on the cluster; 5) Generalizing this results; 6)parallelism et executing the parallel version of GEANT4. (Author). 14 refs

  18. Parallel inter channel interaction mechanisms

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.

    1995-01-01

    Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)

  19. 21 conseils pour la collecte de fonds

    International Development Research Centre (IDRC) Digital Library (Canada)

    Visite. Comme ils ne prendront pas la peine de répondre aux lettres qui leur sont adressées (ou aux messages par télécopie, ou aux appels téléphoniques), vous devez vous rendre auprès d'eux. Il faudra plusieurs messages par télécopieur ou appels téléphoniques pour obtenir un rendez-vous. Toutefois, s'ils savent que ...

  20. Parallel processing approach to transform-based image coding

    Science.gov (United States)

    Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

    1991-06-01

    This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.