WorldWideScience

Sample records for parallel processing

  1. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  2. Parallel Framework for Cooperative Processes

    Directory of Open Access Journals (Sweden)

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  3. Linear parallel processing machines I

    Energy Technology Data Exchange (ETDEWEB)

    Von Kunze, M

    1984-01-01

    As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.

  4. Parallel processing for artificial intelligence 1

    CERN Document Server

    Kanal, LN; Kumar, V; Suttner, CB

    1994-01-01

    Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus

  5. Parallel processing for fluid dynamics applications

    International Nuclear Information System (INIS)

    Johnson, G.M.

    1989-01-01

    The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices

  6. Parallel processing of genomics data

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  7. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  8. Parallel processing of structural integrity analysis codes

    International Nuclear Information System (INIS)

    Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

    1996-01-01

    Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab

  9. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  10. Applications of Parallel Processing in Mobile Banking

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.

  11. Evidence of Parallel Processing During Translation

    DEFF Research Database (Denmark)

    Balling, Laura Winther; Hvelplund, Kristian Tangsgaard; Sjørup, Annette Camilla

    2014-01-01

    conclude that translation is a parallel process and that literal translation is likely to be a universal initial default strategy in translation. This conclusion is strengthened by the fact that all three experiments were relatively naturalistic, due to the combination of remote eye tracking and mixed...

  12. Researching the Parallel Process in Supervision and Psychotherapy

    DEFF Research Database (Denmark)

    Jacobsen, Claus Haugaard

    Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out.......Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out....

  13. Parallel processing for artificial intelligence 2

    CERN Document Server

    Kumar, V; Suttner, CB

    1994-01-01

    With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their

  14. Aspects of parallel processing and control engineering

    OpenAIRE

    McKittrick, Brendan J

    1991-01-01

    The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-pro...

  15. A multitransputer parallel processing system (MTPPS)

    International Nuclear Information System (INIS)

    Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

    1993-01-01

    This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes

  16. Parallel asynchronous systems and image processing algorithms

    Science.gov (United States)

    Coon, D. D.; Perera, A. G. U.

    1989-01-01

    A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.

  17. Oxytocin: parallel processing in the social brain?

    Science.gov (United States)

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.

  18. Fast image processing on parallel hardware

    International Nuclear Information System (INIS)

    Bittner, U.

    1988-01-01

    Current digital imaging modalities in the medical field incorporate parallel hardware which is heavily used in the stage of image formation like the CT/MR image reconstruction or in the DSA real time subtraction. In order to image post-processing as efficient as image acquisition, new software approaches have to be found which take full advantage of the parallel hardware architecture. This paper describes the implementation of two-dimensional median filter which can serve as an example for the development of such an algorithm. The algorithm is analyzed by viewing it as a complete parallel sort of the k pixel values in the chosen window which leads to a generalization to rank order operators and other closely related filters reported in literature. A section about the theoretical base of the algorithm gives hints for how to characterize operations suitable for implementations on pipeline processors and the way to find the appropriate algorithms. Finally some results that computation time and usefulness of medial filtering in radiographic imaging are given

  19. Partitioning sparse rectangular matrices for parallel processing

    Energy Technology Data Exchange (ETDEWEB)

    Kolda, T.G.

    1998-05-01

    The authors are interested in partitioning sparse rectangular matrices for parallel processing. The partitioning problem has been well-studied in the square symmetric case, but the rectangular problem has received very little attention. They will formalize the rectangular matrix partitioning problem and discuss several methods for solving it. They will extend the spectral partitioning method for symmetric matrices to the rectangular case and compare this method to three new methods -- the alternating partitioning method and two hybrid methods. The hybrid methods will be shown to be best.

  20. A qualitative single case study of parallel processes

    DEFF Research Database (Denmark)

    Jacobsen, Claus Haugaard

    2007-01-01

    Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...

  1. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  2. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  3. Parallel processing from applications to systems

    CERN Document Server

    Moldovan, Dan I

    1993-01-01

    This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

  4. The study of image processing of parallel digital signal processor

    International Nuclear Information System (INIS)

    Liu Jie

    2000-01-01

    The author analyzes the basic characteristic of parallel DSP (digital signal processor) TMS320C80 and proposes related optimized image algorithm and the parallel processing method based on parallel DSP. The realtime for many image processing can be achieved in this way

  5. An intelligent allocation algorithm for parallel processing

    Science.gov (United States)

    Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

    1988-01-01

    The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.

  6. Density functional theory and parallel processing

    International Nuclear Information System (INIS)

    Ward, R.C.; Geist, G.A.; Butler, W.H.

    1987-01-01

    The authors demonstrate a method for obtaining the ground state energies and charge densities of a system of atoms described within density functional theory using simulated annealing on a parallel computer

  7. Bessel functions: parallel display and processing.

    Science.gov (United States)

    Lohmann, A W; Ojeda-Castañeda, J; Serrano-Heredia, A

    1994-01-01

    We present an optical setup that converts planar binary curves into two-dimensional amplitude distributions, which are proportional, along one axis, to the Bessel function of order n, whereas along the other axis the order n increases. This Bessel displayer can be used for parallel Bessel transformation of a signal. Experimental verifications are included.

  8. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    Science.gov (United States)

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. Parallel-Processing Test Bed For Simulation Software

    Science.gov (United States)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  10. Parallel processing of two-dimensional Sn transport calculations

    International Nuclear Information System (INIS)

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  11. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

    DEFF Research Database (Denmark)

    Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng

    2017-01-01

    Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled...... solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches....

  12. Parallel and Distributed Data Processing Using Autonomous ...

    African Journals Online (AJOL)

    Looking at the distributed nature of these networks, data is processed by remote login or Remote Procedure Calls (RPC), this causes congestion in the network bandwidth. This paper proposes a framework where software agents are assigned duties to be processing the distributed data concurrently and assembling the ...

  13. Advanced optical signal processing of broadband parallel data signals

    DEFF Research Database (Denmark)

    Oxenløwe, Leif Katsuo; Hu, Hao; Kjøller, Niels-Kristian

    2016-01-01

    Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration.......Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration....

  14. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  15. Parallel processing for pitch splitting decomposition

    Science.gov (United States)

    Barnes, Levi; Li, Yong; Wadkins, David; Biederman, Steve; Miloslavsky, Alex; Cork, Chris

    2009-10-01

    Decomposition of an input pattern in preparation for a double patterning process is an inherently global problem in which the influence of a local decomposition decision can be felt across an entire pattern. In spite of this, a large portion of the work can be massively distributed. Here, we discuss the advantages of geometric distribution for polygon operations with limited range of influence. Further, we have found that even the naturally global "coloring" step can, in large part, be handled in a geometrically local manner. In some practical cases, up to 70% of the work can be distributed geometrically. We also describe the methods for partitioning the problem into local pieces and present scaling data up to 100 CPUs. These techniques reduce DPT decomposition runtime by orders of magnitude.

  16. Parallel processing of neutron transport in fuel assembly calculation

    International Nuclear Information System (INIS)

    Song, Jae Seung

    1992-02-01

    Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's

  17. Parallel and distributed processing: applications to power systems

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Felix; Murphy, Liam [California Univ., Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

    1994-12-31

    Applications of parallel and distributed processing to power systems problems are still in the early stages. Rapid progress in computing and communications promises a revolutionary increase in the capacity of distributed processing systems. In this paper, the state-of-the art in distributed processing technology and applications is reviewed and future trends are discussed. (author) 14 refs.,1 tab.

  18. The Acoustic and Peceptual Effects of Series and Parallel Processing

    Directory of Open Access Journals (Sweden)

    Melinda C. Anderson

    2009-01-01

    Full Text Available Temporal envelope (TE cues provide a great deal of speech information. This paper explores how spectral subtraction and dynamic-range compression gain modifications affect TE fluctuations for parallel and series configurations. In parallel processing, algorithms compute gains based on the same input signal, and the gains in dB are summed. In series processing, output from the first algorithm forms the input to the second algorithm. Acoustic measurements show that the parallel arrangement produces more gain fluctuations, introducing more changes to the TE than the series configurations. Intelligibility tests for normal-hearing (NH and hearing-impaired (HI listeners show (1 parallel processing gives significantly poorer speech understanding than an unprocessed (UNP signal and the series arrangement and (2 series processing and UNP yield similar results. Speech quality tests show that UNP is preferred to both parallel and series arrangements, although spectral subtraction is the most preferred. No significant differences exist in sound quality between the series and parallel arrangements, or between the NH group and the HI group. These results indicate that gain modifications affect intelligibility and sound quality differently. Listeners appear to have a higher tolerance for gain modifications with regard to intelligibility, while judgments for sound quality appear to be more affected by smaller amounts of gain modification.

  19. Efficient multitasking: parallel versus serial processing of multiple tasks.

    Science.gov (United States)

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  20. A Novel Least Significant Bit First Processing Parallel CRC Circuit

    Directory of Open Access Journals (Sweden)

    Xiujie Qu

    2013-01-01

    Full Text Available In HDLC serial communication protocol, CRC calculation can first process the most or least significant bit of data. Nowadays most CRC calculation is based on the most significant bit (MSB first processing. An algorithm of the least significant bit (LSB first processing parallel CRC is proposed in this paper. Based on the general expression of the least significant bit first processing serial CRC, using state equation method of linear system, we derive a recursive formula by the mathematical deduction. The recursive formula is applicable to any number of bits processed in parallel and any series of generator polynomial. According to the formula, we present the parallel circuit of CRC calculation and implement it with VHDL on FPGA. The results verify the accuracy and effectiveness of this method.

  1. Method of parallel processing in SANPO real time system

    International Nuclear Information System (INIS)

    Ostrovnoj, A.I.; Salamatin, I.M.

    1981-01-01

    A method of parellel processing in SANPO real time system is described. Algorithms of data accumulation and preliminary processing in this system as a parallel processes using a specialized high level programming language are described. Hierarchy of elementary processes are also described. It provides the synchronization of concurrent processes without semaphors. The developed means are applied to the systems of experiment automation using SM-3 minicomputers [ru

  2. Parallel and distributed processing in power system simulation and control

    Energy Technology Data Exchange (ETDEWEB)

    Falcao, Djalma M [Universidade Federal, Rio de Janeiro, RJ (Brazil). Coordenacao dos Programas de Pos-graduacao de Engenharia

    1994-12-31

    Recent advances in computer technology will certainly have a great impact in the methodologies used in power system expansion and operational planning as well as in real-time control. Parallel and distributed processing are among the new technologies that present great potential for application in these areas. Parallel computers use multiple functional or processing units to speed up computation while distributed processing computer systems are collection of computers joined together by high speed communication networks having many objectives and advantages. The paper presents some ideas for the use of parallel and distributed processing in power system simulation and control. It also comments on some of the current research work in these topics and presents a summary of the work presently being developed at COPPE. (author) 53 refs., 2 figs.

  3. Spatially parallel processing of within-dimension conjunctions.

    Science.gov (United States)

    Linnell, K J; Humphreys, G W

    2001-01-01

    Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.

  4. Decomposition based parallel processing technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2000-01-01

    In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  5. Parallel transaction processing in functional languages, towards practical functional databases

    NARCIS (Netherlands)

    Wevers, L.; Huisman, Marieke; de Keijzer, Ander

    2013-01-01

    This paper shows how functional languages can be adapted for transaction processing, and discusses the implementation of a parallel runtime system for such functional transaction processing languages. We extend functional languages with current state variables and result state variables to allow the

  6. Test generation for digital circuits using parallel processing

    Science.gov (United States)

    Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

    1990-12-01

    The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.

  7. Leveraging Parallel Data Processing Frameworks with Verified Lifting

    Directory of Open Access Journals (Sweden)

    Maaz Bin Safeer Ahmad

    2016-11-01

    Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.

  8. The Extended Parallel Process Model: Illuminating the Gaps in Research

    Science.gov (United States)

    Popova, Lucy

    2012-01-01

    This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…

  9. Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

    Science.gov (United States)

    Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

    2013-01-01

    Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

  10. Heterogeneous Multicore Parallel Programming for Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Francois Bodin

    2009-01-01

    Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

  11. An educational tool for interactive parallel and distributed processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2012-01-01

    In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...

  12. Parallel processing approach to transform-based image coding

    Science.gov (United States)

    Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

    1991-06-01

    This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.

  13. Parallel processing based decomposition technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2001-01-01

    In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  14. Application of parallel processing for automatic inspection of printed circuits

    International Nuclear Information System (INIS)

    Lougheed, R.M.

    1986-01-01

    Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given

  15. An Educational Tool for Interactive Parallel and Distributed Processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2011-01-01

    In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation of the abs......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...

  16. Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

    Science.gov (United States)

    Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

    2015-01-01

    Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.

  17. Surface topography of parallel grinding process for nonaxisymmetric aspheric lens

    International Nuclear Information System (INIS)

    Zhang Ningning; Wang Zhenzhong; Pan Ri; Wang Chunjin; Guo Yinbiao

    2012-01-01

    Workpiece surface profile, texture and roughness can be predicted by modeling the topography of wheel surface and modeling kinematics of grinding process, which compose an important part of precision grinding process theory. Parallel grinding technology is an important method for nonaxisymmetric aspheric lens machining, but there is few report on relevant simulation. In this paper, a simulation method based on parallel grinding for precision machining of aspheric lens is proposed. The method combines modeling the random surface of wheel and modeling the single grain track based on arc wheel contact points. Then, a mathematical algorithm for surface topography is proposed and applied in conditions of different machining parameters. The consistence between the results of simulation and test proves that the algorithm is correct and efficient. (authors)

  18. Digital intermediate frequency QAM modulator using parallel processing

    Science.gov (United States)

    Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

    2008-05-27

    The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.

  19. A dataflow analysis tool for parallel processing of algorithms

    Science.gov (United States)

    Jones, Robert L., III

    1993-01-01

    A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on a set of identical parallel processors. Typical applications include signal processing and control law problems. Graph analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool is shown to facilitate the application of the design process to a given problem.

  20. Image processing with massively parallel computer Quadrics Q1

    International Nuclear Information System (INIS)

    Della Rocca, A.B.; La Porta, L.; Ferriani, S.

    1995-05-01

    Aimed to evaluate the image processing capabilities of the massively parallel computer Quadrics Q1, a convolution algorithm that has been implemented is described in this report. At first the discrete convolution mathematical definition is recalled together with the main Q1 h/w and s/w features. Then the different codification forms of the algorythm are described and the Q1 performances are compared with those obtained by different computers. Finally, the conclusions report on main results and suggestions

  1. Parallel Distributed Processing theory in the age of deep networks

    OpenAIRE

    Bowers, Jeffrey

    2017-01-01

    Parallel Distributed Processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely, that all knowledge is coded in a distributed format, and cognition is mediated by non-symbolic computations. These claims have long been debated within cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks le...

  2. Multi states electromechanical switch for energy efficient parallel data processing

    KAUST Repository

    Kloub, Hussam

    2011-04-01

    We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.

  3. Multi states electromechanical switch for energy efficient parallel data processing

    KAUST Repository

    Kloub, Hussam; Smith, Casey; Hussain, Muhammad Mustafa

    2011-01-01

    We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.

  4. Morphological evidence for parallel processing of information in rat macula

    Science.gov (United States)

    Ross, M. D.

    1988-01-01

    Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.

  5. Tolerating correlated failures in Massively Parallel Stream Processing Engines

    DEFF Research Database (Denmark)

    Su, L.; Zhou, Y.

    2016-01-01

    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the o......Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...

  6. Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

    2010-01-01

    Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.

  7. Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing

    Science.gov (United States)

    Arifin, Y.; Isa, S. M.; Wulandhari, L. A.; Abdurachman, E.

    2018-03-01

    The plagiarism has many forms, not only copy paste but include changing passive become active voice, or paraphrasing without appropriate acknowledgment. It happens on all language include Indonesian Language. There are many previous research that related with plagiarism detection in Indonesian Language with different method. But there are still some part that still has opportunity to improve. This research proposed the solution that can improve the plagiarism detection technique that can detect not only copy paste form but more advance than that. The proposed solution is using Winnowing with some addition process in pre-processing stage. With stemming processing in Indonesian Language and generate fingerprint in parallel processing that can saving time processing and produce the plagiarism result on the suspected document.

  8. Parallel factor analysis PARAFAC of process affected water

    Energy Technology Data Exchange (ETDEWEB)

    Ewanchuk, A.M.; Ulrich, A.C.; Sego, D. [Alberta Univ., Edmonton, AB (Canada). Dept. of Civil and Environmental Engineering; Alostaz, M. [Thurber Engineering Ltd., Calgary, AB (Canada)

    2010-07-01

    A parallel factor analysis (PARAFAC) of oil sands process-affected water was presented. Naphthenic acids (NA) are traditionally described as monobasic carboxylic acids. Research has indicated that oil sands NA do not fit classical definitions of NA. Oil sands organic acids have toxic and corrosive properties. When analyzed by fluorescence technology, oil sands process-affected water displays a characteristic peak at 290 nm excitation and approximately 346 nm emission. In this study, a parallel factor analysis (PARAFAC) was used to decompose process-affected water multi-way data into components representing analytes, chemical compounds, and groups of compounds. Water samples from various oil sands operations were analyzed in order to obtain EEMs. The EEMs were then arranged into a large matrix in decreasing process-affected water content for PARAFAC. Data were divided into 5 components. A comparison with commercially prepared NA samples suggested that oil sands NA is fundamentally different. Further research is needed to determine what each of the 5 components represent. tabs., figs.

  9. Parallel Processing of Images in Mobile Devices using BOINC

    Science.gov (United States)

    Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

    2018-04-01

    Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.

  10. Parallel Processing of Images in Mobile Devices using BOINC

    Directory of Open Access Journals (Sweden)

    Curiel Mariela

    2018-04-01

    Full Text Available Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.

  11. Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

    Science.gov (United States)

    Dematté, Lorenzo

    2012-01-01

    Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output

  12. Parallel processing at the SSC: The fact and the fiction

    International Nuclear Information System (INIS)

    Bourianoff, G.; Cole, B.

    1991-10-01

    Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a ''thin element'' analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations (''FLOP''). For the SSC collider lattice consisting of 10 4 elements, performing a tracking of study for a set of 100 particles for 10 7 turns would require 2 x 10 15 FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 x 10 7 seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC

  13. Parallel processing of Monte Carlo code MCNP for particle transport problem

    Energy Technology Data Exchange (ETDEWEB)

    Higuchi, Kenji; Kawasaki, Takuji

    1996-06-01

    It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)

  14. GPU: the biggest key processor for AI and parallel processing

    Science.gov (United States)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  15. Z-buffer image assembly processing in high parallel visualization processing

    International Nuclear Information System (INIS)

    Kaneko, Isamu; Muramatsu, Kazuhiro

    2000-03-01

    On the platform of the parallel computer with many processors, the domain decomposition method is used as a popular means of parallel processing. In these days when the simulation scale becomes much larger and takes a lot of time, the simultaneous visualization processing with the actual computation is much more needed, and especially in case of a real-time visualization, the domain decomposition technique is indispensable. In case of parallel rendering processing, the rendered results must be gathered to one processor to compose the integrated picture in the last stage. This integration is usually conducted by the method using Z-buffer values. This process, however, induces the crucial problems of much lower speed processing and local memory shortage in case of parallel processing exceeding more than several tens of processors. In this report, the two new solutions are proposed. The one is the adoption of a special operator (Reduce operator) in the parallelization process, and the other is a buffer compression by deleting the background informations. This report includes the performance results of these new techniques to investigate their effect with use of the parallel computer Paragon. (author)

  16. Parallel processing is good for your scientific codes...But massively parallel processing is so much better

    International Nuclear Information System (INIS)

    Thomas, B.; Domain, Ch.; Souffez, Y.; Eon-Duval, P.

    1998-01-01

    Harnessing the power of many computers, to solve concurrently difficult scientific problems, is one of the most innovative trend in High Performance Computing. At EDF, we have invested in parallel computing and have achieved significant results. First we improved the processing speed of strategic codes, in order to extend their scope. Then we turned to numerical simulations at the atomic scale. These computations, we never dreamt of before, provided us with a better understanding of metallurgic phenomena. More precisely we were able to trace defects in alloys that are used in nuclear power plants. (author)

  17. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    Energy Technology Data Exchange (ETDEWEB)

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  18. Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

    Directory of Open Access Journals (Sweden)

    Harald Michalik

    2009-02-01

    Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.

  19. "Let's Move" campaign: applying the extended parallel process model.

    Science.gov (United States)

    Batchelder, Alicia; Matusitz, Jonathan

    2014-01-01

    This article examines Michelle Obama's health campaign, "Let's Move," through the lens of the extended parallel process model (EPPM). "Let's Move" aims to reduce the childhood obesity epidemic in the United States. Developed by Kim Witte, EPPM rests on the premise that people's attitudes can be changed when fear is exploited as a factor of persuasion. Fear appeals work best (a) when a person feels a concern about the issue or situation, and (b) when he or she believes to have the capability of dealing with that issue or situation. Overall, the analysis found that "Let's Move" is based on past health campaigns that have been successful. An important element of the campaign is the use of fear appeals (as it is postulated by EPPM). For example, part of the campaign's strategies is to explain the severity of the diseases associated with obesity. By looking at the steps of EPPM, readers can also understand the strengths and weaknesses of "Let's Move."

  20. Parallel Distributed Processing Theory in the Age of Deep Networks.

    Science.gov (United States)

    Bowers, Jeffrey S

    2017-12-01

    Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory. Copyright © 2017. Published by Elsevier Ltd.

  1. Parallel asynchronous hardware implementation of image processing algorithms

    Science.gov (United States)

    Coon, Darryl D.; Perera, A. G. U.

    1990-01-01

    Research is being carried out on hardware for a new approach to focal plane processing. The hardware involves silicon injection mode devices. These devices provide a natural basis for parallel asynchronous focal plane image preprocessing. The simplicity and novel properties of the devices would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture built from arrays of the devices would form a two-dimensional (2-D) array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuron-like asynchronous pulse-coded form through the laminar processor. No multiplexing, digitization, or serial processing would occur in the preprocessing state. High performance is expected, based on pulse coding of input currents down to one picoampere with noise referred to input of about 10 femtoamperes. Linear pulse coding has been observed for input currents ranging up to seven orders of magnitude. Low power requirements suggest utility in space and in conjunction with very large arrays. Very low dark current and multispectral capability are possible because of hardware compatibility with the cryogenic environment of high performance detector arrays. The aforementioned hardware development effort is aimed at systems which would integrate image acquisition and image processing.

  2. Mobile Devices and GPU Parallelism in Ionospheric Data Processing

    Science.gov (United States)

    Mascharka, D.; Pankratius, V.

    2015-12-01

    Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of

  3. Category specific spatial dissociations of parallel processes underlying visual naming.

    Science.gov (United States)

    Conner, Christopher R; Chen, Gang; Pieters, Thomas A; Tandon, Nitin

    2014-10-01

    The constituent elements and dynamics of the networks responsible for word production are a central issue to understanding human language. Of particular interest is their dependency on lexical category, particularly the possible segregation of nouns and verbs into separate processing streams. We applied a novel mixed-effects, multilevel analysis to electrocorticographic data collected from 19 patients (1942 electrodes) to examine the activity of broadly disseminated cortical networks during the retrieval of distinct lexical categories. This approach was designed to overcome the issues of sparse sampling and individual variability inherent to invasive electrophysiology. Both noun and verb generation evoked overlapping, yet distinct nonhierarchical processes favoring ventral and dorsal visual streams, respectively. Notable differences in activity patterns were noted in Broca's area and superior lateral temporo-occipital regions (verb > noun) and in parahippocampal and fusiform cortices (noun > verb). Comparisons with functional magnetic resonance imaging (fMRI) results yielded a strong correlation of blood oxygen level-dependent signal and gamma power and an independent estimate of group size needed for fMRI studies of cognition. Our findings imply parallel, lexical category-specific processes and reconcile discrepancies between lesional and functional imaging studies. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. A model for dealing with parallel processes in supervision

    Directory of Open Access Journals (Sweden)

    Lilja Cajvert

    2011-03-01

    Supervision in social work is essential for successful outcomes when working with clients. In social work, unconscious difficulties may arise and similar difficulties may occur in supervision as parallel processes. In this article, the development of a practice-based model of supervision to deal with parallel processes in supervision is described. The model has six phases. In the first phase, the focus is on the supervisor’s inner world, his/her own reflections and observations. In the second phase, the supervision situation is “frozen”, and the supervisees are invited to join the supervisor in taking a meta-perspective on the current situation of supervision. The focus in the third phase is on the inner world of all the group members as well as the visualization and identification of reflections and feelings that arose during the supervision process. Phase four focuses on the supervisee who presented a case, and in phase five the focus shifts to the common understanding and theorization of the supervision process as well as the definition and identification of possible parallel processes. In the final phase, the supervisee, with the assistance of the supervisor and other members of the group, develops a solution and determines how to proceed with the client in treatment. This article uses phenomenological concepts to provide a theoretical framework for the supervision model. Phenomenological reduction is an important approach to examine and to externalize and visualize the inner words of the supervisor and supervisees. Een model voor het hanteren van parallelle processen tijdens supervisie Om succesvol te zijn in de hulpverlening aan cliënten, is supervisie cruciaal in het sociaal werk. Tijdens de hulpverlening kunnen impliciete moeilijkheden de kop opsteken en soortgelijke moeilijkheden duiken soms ook op tijdens supervisie. Dit worden parallelle processen genoemd. Dit artikel beschrijft een op praktijkervaringen gebaseerd model om dergelijke parallelle

  5. Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

    Directory of Open Access Journals (Sweden)

    Wei Shu

    1994-01-01

    Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.

  6. Parallel processing using an optical delay-based reservoir computer

    Science.gov (United States)

    Van der Sande, Guy; Nguimdo, Romain Modeste; Verschaffelt, Guy

    2016-04-01

    Delay systems subject to delayed optical feedback have recently shown great potential in solving computationally hard tasks. By implementing a neuro-inspired computational scheme relying on the transient response to optical data injection, high processing speeds have been demonstrated. However, reservoir computing systems based on delay dynamics discussed in the literature are designed by coupling many different stand-alone components which lead to bulky, lack of long-term stability, non-monolithic systems. Here we numerically investigate the possibility of implementing reservoir computing schemes based on semiconductor ring lasers. Semiconductor ring lasers are semiconductor lasers where the laser cavity consists of a ring-shaped waveguide. SRLs are highly integrable and scalable, making them ideal candidates for key components in photonic integrated circuits. SRLs can generate light in two counterpropagating directions between which bistability has been demonstrated. We demonstrate that two independent machine learning tasks , even with different nature of inputs with different input data signals can be simultaneously computed using a single photonic nonlinear node relying on the parallelism offered by photonics. We illustrate the performance on simultaneous chaotic time series prediction and a classification of the Nonlinear Channel Equalization. We take advantage of different directional modes to process individual tasks. Each directional mode processes one individual task to mitigate possible crosstalk between the tasks. Our results indicate that prediction/classification with errors comparable to the state-of-the-art performance can be obtained even with noise despite the two tasks being computed simultaneously. We also find that a good performance is obtained for both tasks for a broad range of the parameters. The results are discussed in detail in [Nguimdo et al., IEEE Trans. Neural Netw. Learn. Syst. 26, pp. 3301-3307, 2015

  7. Parallel coupling of symmetric and asymmetric exclusion processes

    International Nuclear Information System (INIS)

    Tsekouras, K; Kolomeisky, A B

    2008-01-01

    A system consisting of two parallel coupled channels where particles in one of them follow the rules of totally asymmetric exclusion processes (TASEP) and in another one move as in symmetric simple exclusion processes (SSEP) is investigated theoretically. Particles interact with each other via hard-core exclusion potential, and in the asymmetric channel they can only hop in one direction, while on the symmetric lattice particles jump in both directions with equal probabilities. Inter-channel transitions are also allowed at every site of both lattices. Stationary state properties of the system are solved exactly in the limit of strong couplings between the channels. It is shown that strong symmetric couplings between totally asymmetric and symmetric channels lead to an effective partially asymmetric simple exclusion process (PASEP) and properties of both channels become almost identical. However, strong asymmetric couplings between symmetric and asymmetric channels yield an effective TASEP with nonzero particle flux in the asymmetric channel and zero flux on the symmetric lattice. For intermediate strength of couplings between the lattices a vertical-cluster mean-field method is developed. This approximate approach treats exactly particle dynamics during the vertical transitions between the channels and it neglects the correlations along the channels. Our calculations show that in all cases there are three stationary phases defined by particle dynamics at entrances, at exits or in the bulk of the system, while phase boundaries depend on the strength and symmetry of couplings between the channels. Extensive Monte Carlo computer simulations strongly support our theoretical predictions. Theoretical calculations and computer simulations predict that inter-channel couplings have a strong effect on stationary properties. It is also argued that our results might be relevant for understanding multi-particle dynamics of motor proteins

  8. A tomograph VMEbus parallel processing data acquisition system

    International Nuclear Information System (INIS)

    Atkins, M.S.; Wilkinson, N.A.; Rogers, J.G.

    1988-11-01

    This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. The data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition systems. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events per second for raw events which are 64 bits wide. Real-time data acquisition and pre-processing requirements can be met by about forty 20 MHz Motorola 68020/68881 processors

  9. Investigation of Mediational Processes Using Parallel Process Latent Growth Curve Modeling

    Science.gov (United States)

    Cheong, JeeWon; MacKinnon, David P.; Khoo, Siek Toon

    2010-01-01

    This study investigated a method to evaluate mediational processes using latent growth curve modeling. The mediator and the outcome measured across multiple time points were viewed as 2 separate parallel processes. The mediational process was defined as the independent variable influencing the growth of the mediator, which, in turn, affected the growth of the outcome. To illustrate modeling procedures, empirical data from a longitudinal drug prevention program, Adolescents Training and Learning to Avoid Steroids, were used. The program effects on the growth of the mediator and the growth of the outcome were examined first in a 2-group structural equation model. The mediational process was then modeled and tested in a parallel process latent growth curve model by relating the prevention program condition, the growth rate factor of the mediator, and the growth rate factor of the outcome. PMID:20157639

  10. Parallel and distributed processing in two SGBDS: A case study

    OpenAIRE

    Francisco Javier Moreno; Nataly Castrillón Charari; Camilo Taborda Zuluaga

    2017-01-01

    Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS), such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. ...

  11. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    Science.gov (United States)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  12. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  13. Parallel and distributed processing in two SGBDS: A case study

    Directory of Open Access Journals (Sweden)

    Francisco Javier Moreno

    2017-04-01

    Full Text Available Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS, such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. Results: The tests of the SQL query evidenced the performance differences between the two DBMS analyzed. However, more thorough testing and a wider variety of queries are needed. Conclusions: The differences in performance between the two DBMSs analyzed show that when evaluating this aspect, it is necessary to consider the particularities of each DBMS and the degree of parallelism of the queries.

  14. A Parallel Algebraic Multigrid Solver on Graphics Processing Units

    KAUST Repository

    Haase, Gundolf

    2010-01-01

    The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.

  15. Study and simulation of a parallel numerical processing machine

    International Nuclear Information System (INIS)

    Bel Hadj, Slaheddine

    1981-12-01

    This study has been carried out in the perspective of the implementation on a minicomputer of the NEPTUNIX package (software for the resolution of very large algebra-differential equation systems). Aiming at increasing the system performance, a previous research work has shown the necessity of reducing the execution time of certain numerical computation tasks, which are of frequent use. It has also demonstrated the feasibility of handling these tasks with efficient algorithms of parallel type. The present work deals with the study and simulation of a parallel architecture processor adapted to the fast execution of these algorithms. A minicomputer fitted with a connection to such a parallel processor, has a greatly extended computing power. Then the architecture of a parallel numerical processor, based on the use of VLSI microprocessors and co-processors, is described. Its design aims at the best cost / performance ratio. The last part deals with the simulation processor with the 'CHAMBOR' program. Results show an increasing factor of 30 in speed, in comparison with the execution on a MITRA 15 minicomputer. Moreover the conflicts importance, mainly at the level of access to a shared resource is evaluated. Although this implementation has been designed having in mind a dedicated application, other uses could be envisaged, particularly for the simulation of nuclear reactors: operator guiding system, the behavioural study under accidental circumstances, etc. (author) [fr

  16. Leveraging Non-Uniform Resources for Parallel Query Processing

    DEFF Research Database (Denmark)

    Mayr, Tobias; Bonnet, Philippe; Gehrke, Johannes

    2003-01-01

    Modular clusters are now composed of non- uniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of exist...

  17. A Parallel Algebraic Multigrid Solver on Graphics Processing Units

    KAUST Repository

    Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot

    2010-01-01

    -vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster

  18. A Parallel Processing Algorithm for Remote Sensing Classification

    Science.gov (United States)

    Gualtieri, J. Anthony

    2005-01-01

    A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.

  19. Vector-Parallel processing of the successive overrelaxation method

    International Nuclear Information System (INIS)

    Yokokawa, Mitsuo

    1988-02-01

    Successive overrelaxation method, called SOR method, is one of iterative methods for solving linear system of equations, and it has been calculated in serial with a natural ordering in many nuclear codes. After the appearance of vector processors, this natural SOR method has been changed for the parallel algorithm such as hyperplane or red-black method, in which the calculation order is modified. These methods are suitable for vector processors, and more high-speed calculation can be obtained compared with the natural SOR method on vector processors. In this report, a new scheme named 4-colors SOR method is proposed. We find that the 4-colors SOR method can be executed on vector-parallel processors and it gives the most high-speed calculation among all SOR methods according to results of the vector-parallel execution on the Alliant FX/8 multiprocessor system. It is also shown that the theoretical optimal acceleration parameters are equal among five different ordering SOR methods, and the difference between convergence rates of these SOR methods are examined. (author)

  20. Parallel processing data network of master and slave transputers controlled by a serial control network

    Science.gov (United States)

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  1. Parallel finite elements with domain decomposition and its pre-processing

    International Nuclear Information System (INIS)

    Yoshida, A.; Yagawa, G.; Hamada, S.

    1993-01-01

    This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)

  2. Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

    Directory of Open Access Journals (Sweden)

    GORGUNOGLU, S.

    2014-05-01

    Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.

  3. Parallel processing of dose calculation for external photon beam therapy

    International Nuclear Information System (INIS)

    Kunieda, Etsuo; Ando, Yutaka; Tsukamoto, Nobuhiro; Ito, Hisao; Kubo, Atsushi

    1994-01-01

    We implemented external photon beam dose calculation programs into a parallel processor system consisting of Transputers, 32-bit processors especially suitable for multi-processor configuration. Two network conformations, binary-tree and pipeline, were evaluated for rectangular and irregular field dose calculation algorithms. Although computation speed increased in proportion to the number of CPU, substantial overhead caused by inter-processor communication occurred when a smaller computation load was delivered to each processor. On the other hand, for irregular field calculation, which requires more computation capability for each calculation point, the communication overhead was still less even when more than 50 processors were involved. Real-time responses could be expected for more complex algorithms by increasing the number of processors. (author)

  4. A tomograph VMEbus parallel processing data acquisition system

    International Nuclear Information System (INIS)

    Wilkinson, N.A.; Rogers, J.G.; Atkins, M.S.

    1989-01-01

    This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. the data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition system. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events which are 64 bits wide

  5. Parallel workflow tools to facilitate human brain MRI post-processing

    Directory of Open Access Journals (Sweden)

    Zaixu eCui

    2015-05-01

    Full Text Available Multi-modal magnetic resonance imaging (MRI techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues.

  6. Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Beckingsale, D. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Gaudin, W. P. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Hornung, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gunney, B. T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gamblin, T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Herdman, J. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Jarvis, S. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom)

    2014-11-17

    Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

  7. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

    Science.gov (United States)

    Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

    2018-04-27

    A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

  8. Parallel direct solver for finite element modeling of manufacturing processes

    DEFF Research Database (Denmark)

    Nielsen, Chris Valentin; Martins, P.A.F.

    2017-01-01

    The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...

  9. Parallel Hyperspectral Image Processing on Distributed Multi-Cluster Systems

    NARCIS (Netherlands)

    Liu, F.; Seinstra, F.J.; Plaza, A.J.

    2011-01-01

    Computationally efficient processing of hyperspectral image cubes can be greatly beneficial in many application domains, including environmental modeling, risk/hazard prevention and response, and defense/security. As individual cluster computers often cannot satisfy the computational demands of

  10. Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets

    Science.gov (United States)

    Verma, R.

    2017-06-01

    We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.

  11. Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

    International Nuclear Information System (INIS)

    Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

    2009-01-01

    Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify

  12. Dynamic CT perfusion image data compression for efficient parallel processing.

    Science.gov (United States)

    Barros, Renan Sales; Olabarriaga, Silvia Delgado; Borst, Jordi; van Walderveen, Marianne A A; Posthuma, Jorrit S; Streekstra, Geert J; van Herk, Marcel; Majoie, Charles B L M; Marquering, Henk A

    2016-03-01

    The increasing size of medical imaging data, in particular time series such as CT perfusion (CTP), requires new and fast approaches to deliver timely results for acute care. Cloud architectures based on graphics processing units (GPUs) can provide the processing capacity required for delivering fast results. However, the size of CTP datasets makes transfers to cloud infrastructures time-consuming and therefore not suitable in acute situations. To reduce this transfer time, this work proposes a fast and lossless compression algorithm for CTP data. The algorithm exploits redundancies in the temporal dimension and keeps random read-only access to the image elements directly from the compressed data on the GPU. To the best of our knowledge, this is the first work to present a GPU-ready method for medical image compression with random access to the image elements from the compressed data.

  13. Category Specific Spatial Dissociations of Parallel Processes Underlying Visual Naming

    OpenAIRE

    Conner, Christopher R.; Chen, Gang; Pieters, Thomas A.; Tandon, Nitin

    2013-01-01

    The constituent elements and dynamics of the networks responsible for word production are a central issue to understanding human language. Of particular interest is their dependency on lexical category, particularly the possible segregation of nouns and verbs into separate processing streams. We applied a novel mixed-effects, multilevel analysis to electrocorticographic data collected from 19 patients (1942 electrodes) to examine the activity of broadly disseminated cortical networks during t...

  14. Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

    Science.gov (United States)

    Read, S J; Vanman, E J; Miller, L C

    1997-01-01

    We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.

  15. Ordering schemes for parallel processing of certain mesh problems

    International Nuclear Information System (INIS)

    O'Leary, D.

    1984-01-01

    In this work, some ordering schemes for mesh points are presented which enable algorithms such as the Gauss-Seidel or SOR iteration to be performed efficiently for the nine-point operator finite difference method on computers consisting of a two-dimensional grid of processors. Convergence results are presented for the discretization of u /SUB xx/ + u /SUB yy/ on a uniform mesh over a square, showing that the spectral radius of the iteration for these orderings is no worse than that for the standard row by row ordering of mesh points. Further applications of these mesh point orderings to network problems, more general finite difference operators, and picture processing problems are noted

  16. Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

    OpenAIRE

    Givelberg, Edward

    2014-01-01

    We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...

  17. Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

    International Nuclear Information System (INIS)

    Choi, Sung Hoon; Joo, Han Gyu

    2009-01-01

    Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated

  18. The role of parallelism in the real-time processing of anaphora.

    Science.gov (United States)

    Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P

    2012-06-01

    Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution.

  19. Adapting high-level language programs for parallel processing using data flow

    Science.gov (United States)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  20. Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

    Science.gov (United States)

    Doerner, Edgardo

    2016-05-01

    In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.

  1. When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias

    OpenAIRE

    Trippas, Dries; Thompson, Valerie A.; Handley, Simon J.

    2016-01-01

    Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was ...

  2. The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

    Energy Technology Data Exchange (ETDEWEB)

    Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

    1996-03-01

    The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).

  3. Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

    Science.gov (United States)

    Liu, Kuojuey Ray

    1990-01-01

    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.

  4. Visual analysis of inter-process communication for large-scale parallel computing.

    Science.gov (United States)

    Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

    2009-01-01

    In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.

  5. Implementation of parallel processing in the basf2 framework for Belle II

    International Nuclear Information System (INIS)

    Itoh, Ryosuke; Lee, Soohyung; Katayama, N; Mineo, S; Moll, A; Kuhr, T; Heck, M

    2012-01-01

    Recent PC servers are equipped with multi-core CPUs and it is desired to utilize the full processing power of them for the data analysis in large scale HEP experiments. A software framework basf2 is being developed for the use in the Belle II experiment, a new generation B-factory experiment at KEK, and the parallel event processing to utilize the multi-core CPUs is in its design for the use in the massive data production. The details of the implementation of event parallel processing in the basf2 framework are discussed with the report of preliminary performance study in the realistic use on a 32 core PC server.

  6. Fear Control an Danger Control: A Test of the Extended Parallel Process Model (EPPM).

    Science.gov (United States)

    Witte, Kim

    1994-01-01

    Explores cognitive and emotional mechanisms underlying success and failure of fear appeals in context of AIDS prevention. Offers general support for Extended Parallel Process Model. Suggests that cognitions lead to fear appeal success (attitude, intention, or behavior changes) via danger control processes, whereas the emotion fear leads to fear…

  7. A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

    2008-01-01

    This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...

  8. Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

    DEFF Research Database (Denmark)

    Srba, Jirí

    2002-01-01

    We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...

  9. Parallels between a Collaborative Research Process and the Middle Level Philosophy

    Science.gov (United States)

    Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen

    2014-01-01

    The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…

  10. Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

    Science.gov (United States)

    Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

    2014-12-23

    Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.

  11. Toward a model framework of generalized parallel componential processing of multi-symbol numbers.

    Science.gov (United States)

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-05-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining and investigating a sign-decade compatibility effect for the comparison of positive and negative numbers, which extends the unit-decade compatibility effect in 2-digit number processing. Then, we evaluated whether the model is capable of accounting for previous findings in negative number processing. In a magnitude comparison task, in which participants had to single out the larger of 2 integers, we observed a reliable sign-decade compatibility effect with prolonged reaction times for incompatible (e.g., -97 vs. +53; in which the number with the larger decade digit has the smaller, i.e., negative polarity sign) as compared with sign-decade compatible number pairs (e.g., -53 vs. +97). Moreover, an analysis of participants' eye fixation behavior corroborated our model of parallel componential processing of multi-symbol numbers. These results are discussed in light of concurrent theoretical notions about negative number processing. On the basis of the present results, we propose a generalized integrated model framework of parallel componential multi-symbol processing. (c) 2015 APA, all rights reserved).

  12. Managing internode data communications for an uninitialized process in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

    2014-05-20

    A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.

  13. Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

    International Nuclear Information System (INIS)

    Hicks, D.L.

    1983-11-01

    In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references

  14. Processing communications events in parallel active messaging interface by awakening thread from wait state

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-22

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  15. Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

    Science.gov (United States)

    Dembo, Richard; Sullivan, Christopher

    2009-01-01

    We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…

  16. Recent development for the ITS code system: Parallel processing and visualization

    International Nuclear Information System (INIS)

    Fan, W.C.; Turner, C.D.; Halbleib, J.A. Sr.; Kensek, R.P.

    1996-01-01

    A brief overview is given for two software developments related to the ITS code system. These developments provide parallel processing and visualization capabilities and thus allow users to perform ITS calculations more efficiently. Timing results and a graphical example are presented to demonstrate these capabilities

  17. Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision

    Science.gov (United States)

    Hinkle, Michelle Gimenez

    2008-01-01

    This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.

  18. Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

    Science.gov (United States)

    Rogers, Timothy T.; McClelland, James L.

    2014-01-01

    This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…

  19. An Inconvenient Truth: An Application of the Extended Parallel Process Model

    Science.gov (United States)

    Goodall, Catherine E.; Roberto, Anthony J.

    2008-01-01

    "An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…

  20. Real-time SHVC software decoding with multi-threaded parallel processing

    Science.gov (United States)

    Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

    2014-09-01

    This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

  1. Parallel processing and non-uniform grids in global air quality modeling

    NARCIS (Netherlands)

    Berkvens, P.J.F.; Bochev, Mikhail A.

    2002-01-01

    A large-scale global air quality model, running efficiently on a single vector processor, is enhanced to make more realistic and more long-term simulations feasible. Two strategies are combined: non-uniform grids and parallel processing. The communication through the hierarchy of non-uniform grids

  2. One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

    Science.gov (United States)

    Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

    2009-01-01

    Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…

  3. High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

    Energy Technology Data Exchange (ETDEWEB)

    Koniges, A.

    1996-02-09

    This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.

  4. Exact stationary state for an asymmetric exclusion process with fully parallel dynamics

    NARCIS (Netherlands)

    Gier, J.C.|info:eu-repo/dai/nl/170218430; Nienhuis, B.

    The exact stationary state of an asymmetric exclusion process with fully parallel dynamics is obtained using the matrix product ansatz. We give a simple derivation for the deterministic case by a physical interpretation of the dimension of the matrices. We prove the stationarity via a cancellation

  5. Sustainability Attitudes and Behavioral Motivations of College Students: Testing the Extended Parallel Process Model

    Science.gov (United States)

    Perrault, Evan K.; Clark, Scott K.

    2018-01-01

    Purpose: A planet that can no longer sustain life is a frightening thought--and one that is often present in mass media messages. Therefore, this study aims to test the components of a classic fear appeal theory, the extended parallel process model (EPPM) and to determine how well its constructs predict sustainability behavioral intentions. This…

  6. Design of parallel intersector weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Kovanen, Janne; Rouvinen, Asko; Hannukainen, Petri; Saira, Tanja; Jones, Lawrence

    2003-01-01

    This paper presents a new parallel robot Penta-WH, which has five degrees of freedom driven by hydraulic cylinders. The manipulator has a large, singularity-free workspace and high stiffness and it acts as a transport device for welding, machining and inspection end-effectors inside the ITER vacuum vessel. The presented kinematic structure of a parallel robot is particularly suitable for the ITER environment. Analysis of the machining process for ITER, such as the machining methods and forces are given, and the kinematic analyses, such as workspace and force capacity are discussed

  7. Parallel Algorithm of Geometrical Hashing Based on NumPy Package and Processes Pool

    Directory of Open Access Journals (Sweden)

    Klyachin Vladimir Aleksandrovich

    2015-10-01

    Full Text Available The article considers the problem of multi-dimensional geometric hashing. The paper describes a mathematical model of geometric hashing and considers an example of its use in localization problems for the point. A method of constructing the corresponding hash matrix by parallel algorithm is considered. In this paper an algorithm of parallel geometric hashing using a development pattern «pool processes» is proposed. The implementation of the algorithm is executed using the Python programming language and NumPy package for manipulating multidimensional data. To implement the process pool it is proposed to use a class Process Pool Executor imported from module concurrent.futures, which is included in the distribution of the interpreter Python since version 3.2. All the solutions are presented in the paper by corresponding UML class diagrams. Designed GeomNash package includes classes Data, Result, GeomHash, Job. The results of the developed program presents the corresponding graphs. Also, the article presents the theoretical justification for the application process pool for the implementation of parallel algorithms. It is obtained condition t2 > (p/(p-1*t1 of the appropriateness of process pool. Here t1 - the time of transmission unit of data between processes, and t2 - the time of processing unit data by one processor.

  8. Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

    Science.gov (United States)

    Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

    2016-04-01

    Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and

  9. Distributed system for parallel data processing of ECT signals for electromagnetic flaw detection in materials

    International Nuclear Information System (INIS)

    Guliashki, Vassil; Marinova, Galia

    2002-01-01

    The paper proposes a distributed system for parallel data processing of ECT signals for flaw detection in materials. The measured data are stored in files on a host computer, where a JAVA server is located. The host computer is connected through Internet to a set of client computers, distributed geographically. The data are distributed from the host computer by means of the JAVA server to the client computers according their requests. The software necessary for the data processing is installed on each client computer in advance. The organization of the data processing on many computers, working simultaneously in parallel, leads to great time reducing, especially in cases when huge amount of data should be processed in very short time. (Author)

  10. Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

    Science.gov (United States)

    Mack, Harold; Reddi, S. S.

    1980-04-01

    Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.

  11. Fraud Detection in Credit Card Transactions; Using Parallel Processing of Anomalies in Big Data

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Taghva

    2016-10-01

    Full Text Available In parallel to the increasing use of electronic cards, especially in the banking industry, the volume of transactions using these cards has grown rapidly. Moreover, the financial nature of these cards has led to the desirability of fraud in this area. The present study with Map Reduce approach and parallel processing, applied the Kohonen neural network model to detect abnormalities in bank card transactions. For this purpose, firstly it was proposed to classify all transactions into the fraudulent and legal which showed better performance compared with other methods. In the next step, we transformed the Kohonen model into the form of parallel task which demonstrated appropriate performance in terms of time; as expected to be well implemented in transactions with Big Data assumptions.

  12. Performance of MPI parallel processing implemented by MCNP5/ MCNPX for criticality benchmark problems

    International Nuclear Information System (INIS)

    Mark Dennis Usang; Mohd Hairie Rabir; Mohd Amin Sharifuldin Salleh; Mohamad Puad Abu

    2012-01-01

    MPI parallelism are implemented on a SUN Workstation for running MCNPX and on the High Performance Computing Facility (HPC) for running MCNP5. 23 input less obtained from MCNP Criticality Validation Suite are utilized for the purpose of evaluating the amount of speed up achievable by using the parallel capabilities of MPI. More importantly, we will study the economics of using more processors and the type of problem where the performance gain are obvious. This is important to enable better practices of resource sharing especially for the HPC facilities processing time. Future endeavours in this direction might even reveal clues for best MCNP5/ MCNPX coding practices for optimum performance of MPI parallelisms. (author)

  13. Load balancing in highly parallel processing of Monte Carlo code for particle transport

    International Nuclear Information System (INIS)

    Higuchi, Kenji; Takemiya, Hiroshi; Kawasaki, Takuji

    1998-01-01

    In parallel processing of Monte Carlo (MC) codes for neutron, photon and electron transport problems, particle histories are assigned to processors making use of independency of the calculation for each particle. Although we can easily parallelize main part of a MC code by this method, it is necessary and practically difficult to optimize the code concerning load balancing in order to attain high speedup ratio in highly parallel processing. In fact, the speedup ratio in the case of 128 processors remains in nearly one hundred times when using the test bed for the performance evaluation. Through the parallel processing of the MCNP code, which is widely used in the nuclear field, it is shown that it is difficult to attain high performance by static load balancing in especially neutron transport problems, and a load balancing method, which dynamically changes the number of assigned particles minimizing the sum of the computational and communication costs, overcomes the difficulty, resulting in nearly fifteen percentage of reduction for execution time. (author)

  14. Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

    DEFF Research Database (Denmark)

    Srba, Jirí

    2002-01-01

    We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...... regularity (finiteness) checking of BPP. This implies that strong regularity of BPP is also PSPACE-hard....

  15. Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

    Science.gov (United States)

    Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

    2017-08-01

    We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU

  16. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    Science.gov (United States)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.

  17. A new decomposition method for parallel processing multi-level optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Min Soo; Choi, Dong Hoon

    2002-01-01

    In practical designs, most of the multidisciplinary problems have a large-size and complicate design system. Since multidisciplinary problems have hundreds of analyses and thousands of variables, the grouping of analyses and the order of the analyses in the group affect the speed of the total design cycle. Therefore, it is very important to reorder and regroup the original design processes in order to minimize the total computational cost by decomposing large multidisciplinary problems into several MultiDisciplinary Analysis SubSystems (MDASS) and by processing them in parallel. In this study, a new decomposition method is proposed for parallel processing of multidisciplinary design optimization, such as Collaborative Optimization (CO) and Individual Discipline Feasible (IDF) method. Numerical results for two example problems are presented to show the feasibility of the proposed method

  18. Resistance to awareness of the supervisor's transferences with special reference to the parallel process.

    Science.gov (United States)

    Stimmel, B

    1995-06-01

    Supervision is an essential part of psychoanalytic education. Although not taken for granted, it is not studied with the same critical eye as is the analytic process. This paper examines the supervision specifically with a focus on the supervisor's transference towards the supervisee. The point is made, in the context of clinical examples, that one of the ways these transference reactions may be rationalised is within the setting of the parallel process so often encountered in supervision. Parallel process, a very familiar term, is used frequently and easily when discussing supervision. It may be used also as a resistance to awareness of transference phenomena within the supervisor in relation to the supervisee, particularly because of its clinical presentation. It is an enactment between supervisor and supervisee, thus ripe with possibilities for disguise, displacement and gratification. While transference reactions of the supervisee are often discussed, those of the supervisor are notably missing in our literature.

  19. The Temporal Dynamics of Visual Search: Evidence for Parallel Processing in Feature and Conjunction Searches

    Science.gov (United States)

    McElree, Brian; Carrasco, Marisa

    2012-01-01

    Feature and conjunction searches have been argued to delineate parallel and serial operations in visual processing. The authors evaluated this claim by examining the temporal dynamics of the detection of features and conjunctions. The 1st experiment used a reaction time (RT) task to replicate standard mean RT patterns and to examine the shapes of the RT distributions. The 2nd experiment used the response-signal speed–accuracy trade-off (SAT) procedure to measure discrimination (asymptotic detection accuracy) and detection speed (processing dynamics). Set size affected discrimination in both feature and conjunction searches but affected detection speed only in the latter. Fits of models to the SAT data that included a serial component overpredicted the magnitude of the observed dynamics differences. The authors concluded that both features and conjunctions are detected in parallel. Implications for the role of attention in visual processing are discussed. PMID:10641310

  20. Application of parallel computing to seismic damage process simulation of an arch dam

    International Nuclear Information System (INIS)

    Zhong Hong; Lin Gao; Li Jianbo

    2010-01-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  1. Regional-scale calculation of the LS factor using parallel processing

    Science.gov (United States)

    Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

    2015-05-01

    With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.

  2. A New Tool for Intelligent Parallel Processing of Radar/SAR Remotely Sensed Imagery

    Directory of Open Access Journals (Sweden)

    A. Castillo Atoche

    2013-01-01

    Full Text Available A novel parallel tool for large-scale image enhancement/reconstruction and postprocessing of radar/SAR sensor systems is addressed. The proposed parallel tool performs the following intelligent processing steps: image formation, for the application of different system-level effects of image degradation with a particular remote sensing (RS system and simulation of random noising effects, enhancement/reconstruction by employing nonparametric robust high-resolution techniques, and image postprocessing using the fuzzy anisotropic diffusion technique which incorporates a better edge-preserving noise removal effect and faster diffusion process. This innovative tool allows the processing of high-resolution images provided with different radar/SAR sensor systems as required by RS endusers for environmental monitoring, risk prevention, and resource management. To verify the performance implementation of the proposed parallel framework, the processing steps are developed and specifically tested on graphic processing units (GPU, achieving considerable speedups compared to the serial version of the same techniques implemented in C language.

  3. Design of a family of integrated parallel co-processors for images processing

    International Nuclear Information System (INIS)

    Court, Thierry

    1991-01-01

    The design of parallel image processing Systems joining in a same architecture, sophisticated microprocessors and specialised operators is a difficult task, because of the various problems to be taken into account. The current study identifies a certain way of realizing and interfacing such dedicated operators to a central unit with microprocessor type. The two guide lines of this work are the search for polyvalent specialized and re-configurated operators as well as their connections to a System bus, and not to specialized video buses. This research work proposes a certain architecture of circuits dedicated to image processing and two realization proposals of them. One of them was be realized in this study by using silicon compiler tools. This work belongs to a more important project, whose aim is the development of an industrial image processing System, high performing, modular, based on the parallelization, in MIMD structures, of an elementary, autonomous image processing unit integrating a microprocessor equipped with a parallel coprocessor suited to image processing. (author) [fr

  4. A learnable parallel processing architecture towards unity of memory and computing.

    Science.gov (United States)

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  5. A learnable parallel processing architecture towards unity of memory and computing

    Science.gov (United States)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  6. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    Science.gov (United States)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the

  7. Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex

    Science.gov (United States)

    Lafer-Sousa, Rosa; Conway, Bevil R.

    2014-01-01

    Visual-object processing culminates in inferior temporal (IT) cortex. To assess the organization of IT, we measured fMRI responses in alert monkey to achromatic images (faces, fruit, bodies, places) and colored gratings. IT contained multiple color-biased regions, which were typically ventral to face patches and, remarkably, yoked to them, spaced regularly at four locations predicted by known anatomy. Color and face selectivity increased for more anterior regions, indicative of a broad hierarchical arrangement. Responses to non-face shapes were found across IT, but were stronger outside color-biased regions and face patches, consistent with multiple parallel streams. IT also contained multiple coarse eccentricity maps: face patches overlapped central representations; color-biased regions spanned mid-peripheral representations; and place-biased regions overlapped peripheral representations. These results suggest that IT comprises parallel, multi-stage processing networks subject to one organizing principle. PMID:24141314

  8. The Masterson Approach with play therapy: a parallel process between mother and child.

    Science.gov (United States)

    Mulherin, M A

    2001-01-01

    This paper discusses a case in which the Masterson Approach was used with play therapy to treat a child with a developing personality disorder. It describes the parallel progression of the child and mother in adjunct therapy throughout a six-year period. The unique value of the Masterson Approach is that it provides the therapist with a framework and tool to diagnose and treat a child during the dynamic process of play. The case describes the mother-child dyad throughout therapy. It traces their parallel processes that involve separation, individuation, rapprochement, and the recovery of real self-capacities. Each stage of treatment is described, including verbal interventions. The child's internal affective state and intrapsychic structure during the various stages of treatment are illustrated by representative pictures.

  9. Data structures and languages in support of parallel image processing for astronomy

    International Nuclear Information System (INIS)

    Tanimoto, S.L.

    1985-01-01

    This paper discusses data structures, and aspects of programming languages and systems that are relevant to image processing of astronomy data. Emphasis is on image processing computations, because this kind of data processing is obviously a ripe one for parallelism and is important in astronomy. However, some discussion of general possibilities are also presented. The role of algorithms is examined since they are not dependent on a particular language. As an implementation of an algorithm a program is equally tied to data structure, operations, architecture and language, and therefore the issue of programming resides in the center of the tetrahedron

  10. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  11. The concept of parallel input/output processing for an electron linac

    International Nuclear Information System (INIS)

    Emoto, Takashi

    1993-01-01

    The instrumentation of and the control system for the PNC 10 MeV CW electron linac are described. A new concept of parallel input/output processing for the linac has been introduced. It is based on a substantial number of input/output processors(IOP) using beam control and diagnostics. The flexibility and simplicity of hardware/software are significant advantages with this scheme. (author)

  12. A Review of Parallel Processing Approaches to Robot Kinematics and Jacobian

    OpenAIRE

    Henrich, Dominik; Karl, Joachim; Wörn, Heinz

    1997-01-01

    Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and class...

  13. Tuning of tool dynamics for increased stability of parallel (simultaneous) turning processes

    Science.gov (United States)

    Ozturk, E.; Comak, A.; Budak, E.

    2016-01-01

    Parallel (simultaneous) turning operations make use of more than one cutting tool acting on a common workpiece offering potential for higher productivity. However, dynamic interaction between the tools and workpiece and resulting chatter vibrations may create quality problems on machined surfaces. In order to determine chatter free cutting process parameters, stability models can be employed. In this paper, stability of parallel turning processes is formulated in frequency and time domain for two different parallel turning cases. Predictions of frequency and time domain methods demonstrated reasonable agreement with each other. In addition, the predicted stability limits are also verified experimentally. Simulation and experimental results show multi regional stability diagrams which can be used to select most favorable set of process parameters for higher stable material removal rates. In addition to parameter selection, developed models can be used to determine the best natural frequency ratio of tools resulting in the highest stable depth of cuts. It is concluded that the most stable operations are obtained when natural frequency of the tools are slightly off each other and worst stability occurs when the natural frequency of the tools are exactly the same.

  14. Understanding decimal proportions: discrete representations, parallel access, and privileged processing of zero.

    Science.gov (United States)

    Varma, Sashank; Karl, Stacy R

    2013-05-01

    Much of the research on mathematical cognition has focused on the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9, with considerably less attention paid to more abstract number classes. The current research investigated how people understand decimal proportions--rational numbers between 0 and 1 expressed in the place-value symbol system. The results demonstrate that proportions are represented as discrete structures and processed in parallel. There was a semantic interference effect: When understanding a proportion expression (e.g., "0.29"), both the correct proportion referent (e.g., 0.29) and the incorrect natural number referent (e.g., 29) corresponding to the visually similar natural number expression (e.g., "29") are accessed in parallel, and when these referents lead to conflicting judgments, performance slows. There was also a syntactic interference effect, generalizing the unit-decade compatibility effect for natural numbers: When comparing two proportions, their tenths and hundredths components are processed in parallel, and when the different components lead to conflicting judgments, performance slows. The results also reveal that zero decimals--proportions ending in zero--serve multiple cognitive functions, including eliminating semantic interference and speeding processing. The current research also extends the distance, semantic congruence, and SNARC effects from natural numbers to decimal proportions. These findings inform how people understand the place-value symbol system, and the mental implementation of mathematical symbol systems more generally. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Fast phase processing in off-axis holography by CUDA including parallel phase unwrapping.

    Science.gov (United States)

    Backoach, Ohad; Kariv, Saar; Girshovitz, Pinhas; Shaked, Natan T

    2016-02-22

    We present parallel processing implementation for rapid extraction of the quantitative phase maps from off-axis holograms on the Graphics Processing Unit (GPU) of the computer using computer unified device architecture (CUDA) programming. To obtain efficient implementation, we parallelized both the wrapped phase map extraction algorithm and the two-dimensional phase unwrapping algorithm. In contrast to previous implementations, we utilized unweighted least squares phase unwrapping algorithm that better suits parallelism. We compared the proposed algorithm run times on the CPU and the GPU of the computer for various sizes of off-axis holograms. Using the GPU implementation, we extracted the unwrapped phase maps from the recorded off-axis holograms at 35 frames per second (fps) for 4 mega pixel holograms, and at 129 fps for 1 mega pixel holograms, which presents the fastest processing framerates obtained so far, to the best of our knowledge. We then used common-path off-axis interferometric imaging to quantitatively capture the phase maps of a micro-organism with rapid flagellum movements.

  16. Parallel processing method for high-speed real time digital pulse processing for gamma-ray spectroscopy

    International Nuclear Information System (INIS)

    Fernandes, A.M.; Pereira, R.C.; Sousa, J.; Neto, A.; Carvalho, P.; Batista, A.J.N.; Carvalho, B.B.; Varandas, C.A.F.; Tardocchi, M.; Gorini, G.

    2010-01-01

    A new data acquisition (DAQ) system was developed to fulfil the requirements of the gamma-ray spectrometer (GRS) JET-EP2 (joint European Torus enhancement project 2), providing high-resolution spectroscopy at very high-count rate (up to few MHz). The system is based on the Advanced Telecommunications Computing Architecture TM (ATCA TM ) and includes a transient record (TR) module with 8 channels of 14 bits resolution at 400 MSamples/s (MSPS) sampling rate, 4 GB of local memory, and 2 field programmable gate array (FPGA) able to perform real time algorithms for data reduction and digital pulse processing. Although at 400 MSPS only fast programmable devices such as FPGAs can be used either for data processing and data transfer, FPGA resources also present speed limitation at some specific tasks, leading to an unavoidable data lost when demanding algorithms are applied. To overcome this problem and foreseeing an increase of the algorithm complexity, a new digital parallel filter was developed, aiming to perform real time pulse processing in the FPGAs of the TR module at the presented sampling rate. The filter is based on the conventional digital time-invariant trapezoidal shaper operating with parallelized data while performing pulse height analysis (PHA) and pile up rejection (PUR). The incoming sampled data is successively parallelized and fed into the processing algorithm block at one fourth of the sampling rate. The following data processing and data transfer is also performed at one fourth of the sampling rate. The algorithm based on data parallelization technique was implemented and tested at JET facilities, where a spectrum was obtained. Attending to the observed results, the PHA algorithm will be improved by implementing the pulse pile up discrimination.

  17. Effects of visual information regarding allocentric processing in haptic parallelity matching.

    Science.gov (United States)

    Van Mier, Hanneke I

    2013-10-01

    Research has revealed that haptic perception of parallelity deviates from physical reality. Large and systematic deviations have been found in haptic parallelity matching most likely due to the influence of the hand-centered egocentric reference frame. Providing information that increases the influence of allocentric processing has been shown to improve performance on haptic matching. In this study allocentric processing was stimulated by providing informative vision in haptic matching tasks that were performed using hand- and arm-centered reference frames. Twenty blindfolded participants (ten men, ten women) explored the orientation of a reference bar with the non-dominant hand and subsequently matched (task HP) or mirrored (task HM) its orientation on a test bar with the dominant hand. Visual information was provided by means of informative vision with participants having full view of the test bar, while the reference bar was blocked from their view (task VHP). To decrease the egocentric bias of the hands, participants also performed a visual haptic parallelity drawing task (task VHPD) using an arm-centered reference frame, by drawing the orientation of the reference bar. In all tasks, the distance between and orientation of the bars were manipulated. A significant effect of task was found; performance improved from task HP, to VHP to VHPD, and HM. Significant effects of distance were found in the first three tasks, whereas orientation and gender effects were only significant in tasks HP and VHP. The results showed that stimulating allocentric processing by means of informative vision and reducing the egocentric bias by using an arm-centered reference frame led to most accurate performance on parallelity matching. © 2013 Elsevier B.V. All rights reserved.

  18. Passive and partially active fault tolerance for massively parallel stream processing engines

    DEFF Research Database (Denmark)

    Su, Li; Zhou, Yongluan

    2018-01-01

    . On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...... also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness...

  19. Lamb wave propagation modelling and simulation using parallel processing architecture and graphical cards

    International Nuclear Information System (INIS)

    Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K

    2012-01-01

    This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)

  20. Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers

    DEFF Research Database (Denmark)

    The following topics are dealt with: parallel scientific computing; numerical algorithms; parallel nonnumerical algorithms; cloud computing; evolutionary computing; metaheuristics; applied mathematics; GPU computing; multicore systems; hybrid architectures; hierarchical parallelism; HPC systems......; power monitoring; energy monitoring; and distributed computing....

  1. High-Performance Parallel and Stream Processing of X-ray Microdiffraction Data on Multicores

    International Nuclear Information System (INIS)

    Bauer, Michael A; McIntyre, Stewart; Xie Yuzhen; Biem, Alain; Tamura, Nobumichi

    2012-01-01

    We present the design and implementation of a high-performance system for processing synchrotron X-ray microdiffraction (XRD) data in IBM InfoSphere Streams on multicore processors. We report on the parallel and stream processing techniques that we use to harvest the power of clusters of multicores to analyze hundreds of gigabytes of synchrotron XRD data in order to reveal the microtexture of polycrystalline materials. The timing to process one XRD image using one pipeline is about ten times faster than the best C program at present. With the support of InfoSphere Streams platform, our software is able to be scaled up to operate on clusters of multi-cores for processing multiple images concurrently. This system provides a high-performance processing kernel to achieve near real-time data analysis of image data from synchrotron experiments.

  2. The parallel processing impact in the optimization of the reactors neutronic by genetic algorithms

    International Nuclear Information System (INIS)

    Pereira, Claudio M.N.A.; Universidade Federal, Rio de Janeiro, RJ; Lapa, Celso M.F.; Mol, Antonio C.A.

    2002-01-01

    Nowadays, many optimization problems found in nuclear engineering has been solved through genetic algorithms (GA). The robustness of such methods is strongly related to the nature of search process which is based on populations of solution candidates, and this fact implies high computational cost in the optimization process. The use of GA become more critical when the evaluation process of a solution candidate is highly time consuming. Problems of this nature are common in the nuclear engineering, and an example is the reactor design optimization, where neutronic codes, which consume high CPU time, must be run. Aiming to investigate the impact of the use of parallel computation in the solution, through GA, of a reactor design optimization problem, a parallel genetic algorithm (PGA), using the Island Model, was developed. Exhaustive experiments, then 1500 processing hours in 550 MHz personal computers, have been done, in order to compare the conventional GA with the PGA. Such experiments have demonstrating the superiority of the PGA not only in terms of execution time, but also, in the optimization results. (author)

  3. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

    2014-05-15

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.

  4. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    International Nuclear Information System (INIS)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah

    2014-01-01

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization

  5. Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2012-01-01

    Full Text Available In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs that implement the Compute Unified Device Architecture (CUDA, a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104 and a central processing unit; the data type influence; the binary operator's influence.

  6. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    Energy Technology Data Exchange (ETDEWEB)

    Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

    2014-11-01

    The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.

  7. When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias.

    Science.gov (United States)

    Trippas, Dries; Thompson, Valerie A; Handley, Simon J

    2017-05-01

    Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was valid on half the trials or to decide whether the conclusion was believable on the other half. When belief and logic conflict, the default-interventionist view predicts that it should take less time to respond on the basis of belief than logic, and that the believability of a conclusion should interfere with judgments of validity, but not the reverse. The parallel-processing view predicts that beliefs should interfere with logic judgments only if the processing required to evaluate the logical structure exceeds that required to evaluate the knowledge necessary to make a belief-based judgment, and vice versa otherwise. Consistent with this latter view, for the simplest reasoning problems (modus ponens), judgments of belief resulted in lower accuracy than judgments of validity, and believability interfered more with judgments of validity than the converse. For problems of moderate complexity (modus tollens and single-model syllogisms), the interference was symmetrical, in that validity interfered with belief judgments to the same degree that believability interfered with validity judgments. For the most complex (three-term multiple-model syllogisms), conclusion believability interfered more with judgments of validity than vice versa, in spite of the significant interference from conclusion validity on judgments of belief.

  8. The vector and parallel processing of MORSE code on Monte Carlo Machine

    International Nuclear Information System (INIS)

    Hasegawa, Yukihiro; Higuchi, Kenji.

    1995-11-01

    Multi-group Monte Carlo Code for particle transport, MORSE is modified for high performance computing on Monte Carlo Machine Monte-4. The method and the results are described. Monte-4 was specially developed to realize high performance computing of Monte Carlo codes for particle transport, which have been difficult to obtain high performance in vector processing on conventional vector processors. Monte-4 has four vector processor units with the special hardware called Monte Carlo pipelines. The vectorization and parallelization of MORSE code and the performance evaluation on Monte-4 are described. (author)

  9. LMFAO! Humor as a Response to Fear: Decomposing Fear Control within the Extended Parallel Process Model

    Science.gov (United States)

    Abril, Eulàlia P.; Szczypka, Glen; Emery, Sherry L.

    2017-01-01

    This study seeks to analyze fear control responses to the 2012 Tips from Former Smokers campaign using the Extended Parallel Process Model (EPPM). The goal is to examine the occurrence of ancillary fear control responses, like humor. In order to explore individuals’ responses in an organic setting, we use Twitter data—tweets—collected via the Firehose. Content analysis of relevant fear control tweets (N = 14,281) validated the existence of boomerang responses within the EPPM: denial, defensive avoidance, and reactance. More importantly, results showed that humor tweets were not only a significant occurrence but constituted the majority of fear control responses. PMID:29527092

  10. Leveraging human oversight and intervention in large-scale parallel processing of open-source data

    Science.gov (United States)

    Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

    2015-05-01

    The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.

  11. Processing optimization with parallel computing for the J-PET scanner

    Directory of Open Access Journals (Sweden)

    Krzemień Wojciech

    2015-12-01

    Full Text Available The Jagiellonian Positron Emission Tomograph (J-PET collaboration is developing a prototype time of flight (TOF-positron emission tomograph (PET detector based on long polymer scintillators. This novel approach exploits the excellent time properties of the plastic scintillators, which permit very precise time measurements. The very fast field programmable gate array (FPGA-based front-end electronics and the data acquisition system, as well as low- and high-level reconstruction algorithms were specially developed to be used with the J-PET scanner. The TOF-PET data processing and reconstruction are time and resource demanding operations, especially in the case of a large acceptance detector that works in triggerless data acquisition mode. In this article, we discuss the parallel computing methods applied to optimize the data processing for the J-PET detector. We begin with general concepts of parallel computing and then we discuss several applications of those techniques in the J-PET data processing.

  12. Is orthographic information from multiple parafoveal words processed in parallel: An eye-tracking study.

    Science.gov (United States)

    Cutter, Michael G; Drieghe, Denis; Liversedge, Simon P

    2017-08-01

    In the current study we investigated whether orthographic information available from 1 upcoming parafoveal word influences the processing of another parafoveal word. Across 2 experiments we used the boundary paradigm (Rayner, 1975) to present participants with an identity preview of the 2 words after the boundary (e.g., hot pan ), a preview in which 2 letters were transposed between these words (e.g., hop tan ), or a preview in which the same 2 letters were substituted (e.g., hob fan ). We hypothesized that if these 2 words were processed in parallel in the parafovea then we may observe significant preview benefits for the condition in which the letters were transposed between words relative to the condition in which the letters were substituted. However, no such effect was observed, with participants fixating the words for the same amount of time in both conditions. This was the case both when the transposition was made between the final and first letter of the 2 words (e.g., hop tan as a preview of hot pan ; Experiment 1) and when the transposition maintained within word letter position (e.g., pit hop as a preview of hit pop ; Experiment 2). The implications of these findings are considered in relation to serial and parallel lexical processing during reading. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  13. Parallel photonic information processing at gigabyte per second data rates using transient states

    Science.gov (United States)

    Brunner, Daniel; Soriano, Miguel C.; Mirasso, Claudio R.; Fischer, Ingo

    2013-01-01

    The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.

  14. Application of the parallel processing computer to a nuclear disaster prevention support system

    Energy Technology Data Exchange (ETDEWEB)

    Shigehiro, Nukatsuka; Osami, Watanabe [Mitsubishi Heavy Industries, LTD (Japan)

    2003-07-01

    At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)

  15. A parallel process growth model of avoidant personality disorder symptoms and personality traits.

    Science.gov (United States)

    Wright, Aidan G C; Pincus, Aaron L; Lenzenweger, Mark F

    2013-07-01

    Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, M. F., 2006, The longitudinal study of personality disorders: History, design considerations, and initial findings. Journal of Personality Disorders, 20, 645-670. doi:10.1521/pedi.2006.20.6.645), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general.

  16. Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

    Science.gov (United States)

    Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

    2018-01-01

    The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.

  17. A Parallel Process Growth Model of Avoidant Personality Disorder Symptoms and Personality Traits

    Science.gov (United States)

    Wright, Aidan G. C.; Pincus, Aaron L.; Lenzenweger, Mark F.

    2012-01-01

    Background Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Methods Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, 2006), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. Results AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. Conclusions These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general. PMID:22506627

  18. Application of the parallel processing computer to a nuclear disaster prevention support system

    International Nuclear Information System (INIS)

    Shigehiro, Nukatsuka; Osami, Watanabe

    2003-01-01

    At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)

  19. Single product lot-sizing on unrelated parallel machines with non-decreasing processing times

    Science.gov (United States)

    Eremeev, A.; Kovalyov, M.; Kuznetsov, P.

    2018-01-01

    We consider a problem in which at least a given quantity of a single product has to be partitioned into lots, and lots have to be assigned to unrelated parallel machines for processing. In one version of the problem, the maximum machine completion time should be minimized, in another version of the problem, the sum of machine completion times is to be minimized. Machine-dependent lower and upper bounds on the lot size are given. The product is either assumed to be continuously divisible or discrete. The processing time of each machine is defined by an increasing function of the lot volume, given as an oracle. Setup times and costs are assumed to be negligibly small, and therefore, they are not considered. We derive optimal polynomial time algorithms for several special cases of the problem. An NP-hard case is shown to admit a fully polynomial time approximation scheme. An application of the problem in energy efficient processors scheduling is considered.

  20. Design and simulation of parallel and distributed architectures for images processing

    International Nuclear Information System (INIS)

    Pirson, Alain

    1990-01-01

    The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr

  1. Hardware system of parallel processing for fast CT image reconstruction based on circular shifting float memory architecture

    International Nuclear Information System (INIS)

    Wang Shi; Kang Kejun; Wang Jingjin

    1995-01-01

    Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture and hardware design of PIRS-4 (4-processor Parallel Image Reconstruction System) which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes structure and component of the system, the design of cross bar switch and details of control model. The test results are described

  2. A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 3

    International Nuclear Information System (INIS)

    Wiest, R.; Demuynck, J.; Benard, M.; Rohmer, M.M.; Ernenwein, R.

    1991-01-01

    This series of three papers presents a program system for ab initio molecular orbital calculations on vector and parallel computers. Part III is devoted to the four-index transformation on a molecular orbital basis of size NMO of the file of two-electorn integrals (pqparallelrs) generated by a contracted Gaussian set of size NATO (number of atomic orbitals). A fast Yoshimine algorithm first sorts the (pqparallelrs) integrals with respect to index pq only. This file of half-sorted integrals labelled by their rs-index can be processed without further modification to generate either the transformed integrals or the supermatrix elements. The large memory available on the CRAY-2 hase made possible to implement the transformation algorithm proposed by Bender in 1972, which requires a core-storage allocation varying as (NATO) 3 . Two versions of Bender's algorithm are included in the present program. The first version is an in-core version, where the complete file of accumulated contributions to transformed integrals in stored and updated in central memory. This version has been parallelized by distributing over a limited number of logical tasks the NATO steps corresponding to the scanning of the most external loop. The second version is an out-of-core version, in which twin files are alternatively used as input and output for the accumulated contributions to transformed integrals. This version is not parallel. The choice of one or another version and (for version 1) the determination of the number of tasks depends upon the balance between the available and the requested amounts of storage. The storage management and the choice of the proper version are carried out automatically using dynamic storage allocation. Both versions are vectorized and take advantage of the molecular symmetry. (orig.)

  3. Teaching ethics to engineers: ethical decision making parallels the engineering design process.

    Science.gov (United States)

    Bero, Bridget; Kuhlman, Alana

    2011-09-01

    In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.

  4. Decreasing Data Analytics Time: Hybrid Architecture MapReduce-Massive Parallel Processing for a Smart Grid

    Directory of Open Access Journals (Sweden)

    Abdeslam Mehenni

    2017-03-01

    Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.

  5. Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

    Science.gov (United States)

    Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

    2008-04-01

    This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.

  6. 3D Body Scanning Measurement System Associated with RF Imaging, Zero-padding and Parallel Processing

    Directory of Open Access Journals (Sweden)

    Kim Hyung Tae

    2016-04-01

    Full Text Available This work presents a novel signal processing method for high-speed 3D body measurements using millimeter waves with a general processing unit (GPU and zero-padding fast Fourier transform (ZPFFT. The proposed measurement system consists of a radio-frequency (RF antenna array for a penetrable measurement, a high-speed analog-to-digital converter (ADC for significant data acquisition, and a general processing unit for fast signal processing. The RF waves of the transmitter and the receiver are converted to real and imaginary signals that are sampled by a high-speed ADC and synchronized with the kinematic positions of the scanner. Because the distance between the surface and the antenna is related to the peak frequency of the conjugate signals, a fast Fourier transform (FFT is applied to the signal processing after the sampling. The sampling time is finite owing to a short scanning time, and the physical resolution needs to be increased; further, zero-padding is applied to interpolate the spectra of the sampled signals to consider a 1/m floating point frequency. The GPU and parallel algorithm are applied to accelerate the speed of the ZPFFT because of the large number of additional mathematical operations of the ZPFFT. 3D body images are finally obtained by spectrograms that are the arrangement of the ZPFFT in a 3D space.

  7. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    Science.gov (United States)

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  8. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    Energy Technology Data Exchange (ETDEWEB)

    Ma, Kwan-Liu [Univ. of California, Davis, CA (United States)

    2017-02-01

    Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations will prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for

  9. Massively parallel signal processing using the graphics processing unit for real-time brain-computer interface feature extraction

    Directory of Open Access Journals (Sweden)

    J. Adam Wilson

    2009-07-01

    Full Text Available The clock speeds of modern computer processors have nearly plateaued in the past five years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card (GPU was developed for real-time neural signal processing of a brain-computer interface (BCI. The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter, followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally-intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a CPU-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  10. Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

    CERN Document Server

    Unterkircher, A

    2005-01-01

    We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.

  11. Comparison of microbial community shifts in two parallel multi-step drinking water treatment processes.

    Science.gov (United States)

    Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong

    2017-07-01

    Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.

  12. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

    Science.gov (United States)

    Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2016-01-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051

  13. Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence

    Directory of Open Access Journals (Sweden)

    Natalia eEgorova

    2013-03-01

    Full Text Available Although language is a tool for communication, most research in the neuroscience of language has focused on studying words and sentences, while little is known about the brain mechanisms of speech acts, or communicative functions, for which words and sentences are used as tools. Here the neural processing of two types of speech acts, Naming and Requesting, was addressed using the time-resolved event-related potential (ERP technique. The brain responses for Naming and Request diverged as early as ~120 ms after the onset of the critical words, at the same time as, or even before, the earliest brain manifestations of semantic word properties could be detected. Request-evoked potentials were generally larger in amplitude than those for Naming. The use of identical words in closely matched settings for both speech acts rules out explanation of the difference in terms of phonological, lexical, semantic properties or word expectancy. The cortical sources underlying the ERP enhancement for Requests were found in the fronto-central cortex, consistent with the activation of action knowledge, as well as in right temporo-parietal junction, possibly reflecting additional implications of speech acts for social interaction and theory of mind. These results provide the first evidence for surprisingly early access to pragmatic and social interactive knowledge, which possibly occurs in parallel with other types of linguistic processing, and thus supports the near-simultaneous access to different subtypes of psycholinguistic information.

  14. Parallel processing in the brain’s visual form system: An fMRI study

    Directory of Open Access Journals (Sweden)

    Yoshihito eShigihara

    2014-07-01

    Full Text Available We here extend and complement our earlier time-based, magneto-encephalographic (MEG, study of the processing of forms by the visual brain (Shigihara and Zeki, 2013 with a functional magnetic resonance imaging (fMRI study, in order to better localize the activity produced in early visual areas when subjects view simple geometric stimuli of increasing perceptual complexity (lines, angles, rhomboids constituted from the same elements (lines. Our results show that all three categories of form activate all three visual areas with which we were principally concerned (V1, V2, V3, with angles producing the strongest and rhomboids the weakest activity in all three. The difference between the activity produced by angles and rhomboids was significant, that between lines and rhomboids was trend significant while that between lines and angles was not. Taken together with our earlier MEG results, the present ones suggest that a parallel strategy is used in processing forms, in addition to the well-documented hierarchical strategy.

  15. Parallel and interactive learning processes within the basal ganglia: relevance for the understanding of addiction.

    Science.gov (United States)

    Belin, David; Jonkman, Sietse; Dickinson, Anthony; Robbins, Trevor W; Everitt, Barry J

    2009-04-12

    In this review we discuss the evidence that drug addiction, defined as a maladaptive compulsive habit, results from the progressive subversion by addictive drugs of striatum-dependent operant and Pavlovian learning mechanisms that are usually involved in the control over behaviour by stimuli associated with natural reinforcement. Although mainly organized through segregated parallel cortico-striato-pallido-thalamo-cortical loops involved in motor or emotional functions, the basal ganglia, and especially the striatum, are key mediators of the modulation of behavioural responses, under the control of both action-outcome and stimulus-response mechanisms, by incentive motivational processes and Pavlovian associations. Here we suggest that protracted exposure to addictive drugs recruits serial and dopamine-dependent, striato-nigro-striatal ascending spirals from the nucleus accumbens to more dorsal regions of the striatum that underlie a shift from action-outcome to stimulus-response mechanisms in the control over drug seeking. When this progressive ventral to dorsal striatum shift is combined with drug-associated Pavlovian influences from limbic structures such as the amygdala and the orbitofrontal cortex, drug seeking behaviour becomes established as an incentive habit. This instantiation of implicit sub-cortical processing of drug-associated stimuli and instrumental responding might be a key mechanism underlying the development of compulsive drug seeking and the high vulnerability to relapse which are hallmarks of drug addiction.

  16. Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

    NARCIS (Netherlands)

    A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

    2016-01-01

    textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This

  17. The Development of Reading and Spelling in Arabic Orthography: Two Parallel Processes?

    Science.gov (United States)

    Taha, Haitham

    2016-01-01

    The parallels between reading and spelling skills in Arabic were tested. One-hundred forty-three native Arab students, with typical reading development, from second, fourth, and sixth grades were tested with reading, spelling and orthographic decision tasks. The results indicated a full parallel between the reading and spelling performances within…

  18. Information-Limited Parallel Processing in Difficult Heterogeneous Covert Visual Search

    Science.gov (United States)

    Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

    2010-01-01

    Difficult visual search is often attributed to time-limited serial attention operations, although neural computations in the early visual system are parallel. Using probabilistic search models (Dosher, Han, & Lu, 2004) and a full time-course analysis of the dynamics of covert visual search, we distinguish unlimited capacity parallel versus serial…

  19. Metastable states in the hierarchical Dyson model drive parallel processing in the hierarchical Hopfield network

    International Nuclear Information System (INIS)

    Agliari, Elena; Barra, Adriano; Guerra, Francesco; Galluzzi, Andrea; Tantari, Daniele; Tavani, Flavia

    2015-01-01

    In this paper, we introduce and investigate the statistical mechanics of hierarchical neural networks. First, we approach these systems à la Mattis, by thinking of the Dyson model as a single-pattern hierarchical neural network. We also discuss the stability of different retrievable states as predicted by the related self-consistencies obtained both from a mean-field bound and from a bound that bypasses the mean-field limitation. The latter is worked out by properly reabsorbing the magnetization fluctuations related to higher levels of the hierarchy into effective fields for the lower levels. Remarkably, mixing Amit's ansatz technique for selecting candidate-retrievable states with the interpolation procedure for solving for the free energy of these states, we prove that, due to gauge symmetry, the Dyson model accomplishes both serial and parallel processing. We extend this scenario to multiple stored patterns by implementing the Hebb prescription for learning within the couplings. This results in Hopfield-like networks constrained on a hierarchical topology, for which, by restricting to the low-storage regime where the number of patterns grows at its most logarithmical with the amount of neurons, we prove the existence of the thermodynamic limit for the free energy, and we give an explicit expression of its mean-field bound and of its related improved bound. We studied the resulting self-consistencies for the Mattis magnetizations, which act as order parameters, are studied and the stability of solutions is analyzed to get a picture of the overall retrieval capabilities of the system according to both mean-field and non-mean-field scenarios. Our main finding is that embedding the Hebbian rule on a hierarchical topology allows the network to accomplish both serial and parallel processing. By tuning the level of fast noise affecting it or triggering the decay of the interactions with the distance among neurons, the system may switch from sequential retrieval to

  20. Automatic analysis (aa: efficient neuroimaging workflows and parallel processing using Matlab and XML

    Directory of Open Access Journals (Sweden)

    Rhodri eCusack

    2015-01-01

    Full Text Available Recent years have seen neuroimaging data becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complex to set up and run (increasing the risk of human error and time consuming to execute (restricting what analyses are attempted. Here we present an open-source framework, automatic analysis (aa, to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (redone. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA. However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast and efficient, for simple single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.

  1. Automatic analysis (aa): efficient neuroimaging workflows and parallel processing using Matlab and XML.

    Science.gov (United States)

    Cusack, Rhodri; Vicente-Grabovetsky, Alejandro; Mitchell, Daniel J; Wild, Conor J; Auer, Tibor; Linke, Annika C; Peelle, Jonathan E

    2014-01-01

    Recent years have seen neuroimaging data sets becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complicated to set up and run (increasing the risk of human error) and time consuming to execute (restricting what analyses are attempted). Here we present an open-source framework, automatic analysis (aa), to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (re)done. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA). However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast, and efficient, for simple-single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.

  2. Parallel, but Dissociable, Processing in Discrete Corticostriatal Inputs Encodes Skill Learning.

    Science.gov (United States)

    Kupferschmidt, David A; Juczewski, Konrad; Cui, Guohong; Johnson, Kari A; Lovinger, David M

    2017-10-11

    Changes in cortical and striatal function underlie the transition from novel actions to refined motor skills. How discrete, anatomically defined corticostriatal projections function in vivo to encode skill learning remains unclear. Using novel fiber photometry approaches to assess real-time activity of associative inputs from medial prefrontal cortex to dorsomedial striatum and sensorimotor inputs from motor cortex to dorsolateral striatum, we show that associative and sensorimotor inputs co-engage early in action learning and disengage in a dissociable manner as actions are refined. Disengagement of associative, but not sensorimotor, inputs predicts individual differences in subsequent skill learning. Divergent somatic and presynaptic engagement in both projections during early action learning suggests potential learning-related in vivo modulation of presynaptic corticostriatal function. These findings reveal parallel processing within associative and sensorimotor circuits that challenges and refines existing views of corticostriatal function and expose neuronal projection- and compartment-specific activity dynamics that encode and predict action learning. Published by Elsevier Inc.

  3. Flexible parallel implicit modelling of coupled thermal-hydraulic-mechanical processes in fractured rocks

    Science.gov (United States)

    Cacace, Mauro; Jacquey, Antoine B.

    2017-09-01

    Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture-solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment) which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton-Raphson or by free Jacobian inexact Newton-Krylow schemes) on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres) and temporal scales (from minutes to hundreds of years).

  4. Parallel processing and learning in simple systems. Final report, 10 January 1986-14 January 1989

    Energy Technology Data Exchange (ETDEWEB)

    Mpitsos, G.J.

    1989-03-15

    Work over the three-year tenure of this grant has dealt with interrelated studies of (1) neuropharmacology, (2) behavior, and (3) distributed/parallel processing in the generation of variable motor patterns in the buccal-oral system of the sea slug Pleurobranchaea californica. (4) Computer simulations of simple neutral networks have been undertaken to examine neurointegrative principles that could not be examined in biological preparations. The simulation work has set the basis for further simulations dealing with networks having characteristics relating to real neurons. All of the work has had the goal of developing interdisciplinary tools for understanding the scale-independent problem of how individuals, each possessing only local knowledge of group activity, act within a group to produce different and variable adaptive outputs, and, in turn, of how the group influences the activity of the individual. The pharmacologic studies have had the goal of developing biochemical tools with which to identify groups of neurons that perform specific tasks during the production of a given behavior but are multifunctional by being critically involved in generating several different behaviors.

  5. Flexible parallel implicit modelling of coupled thermal–hydraulic–mechanical processes in fractured rocks

    Directory of Open Access Journals (Sweden)

    M. Cacace

    2017-09-01

    Full Text Available Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture–solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton–Raphson or by free Jacobian inexact Newton–Krylow schemes on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres and temporal scales (from minutes to hundreds of years.

  6. Comparison of Efficacy and Threat Perception Processes in Predicting Smoking among University Students Based on Extended Parallel Process Model

    Directory of Open Access Journals (Sweden)

    S. Bashirian

    2014-04-01

    Full Text Available Introduction & Objective: The survey of smoking as the most toxic, common and cheapest ad-diction, and its psychological and demographic variables especially among the youth who are efficient and constructive individuals of the society is of great importance. This study was performed to compare efficacy and threat perception in predicting cigarette smoking among university students based on Expended Parallel Process Model (EPPM. Material & Methods: This cross sectional descriptive study was carried out on 700 college stu-dents of Hamadan recruited with a stratified sampling method. The participants completed a self-administered questionnaire including demographic characteristics, smoking status and EPPM Data analysis was done with the SPSS software (version 16, using t-test, one way ANOVA, Pierson correlation and logistic regression methods. Results: The average scores of threat and efficacy perception were 39.7 and 38.6, respectively. The prevalence of cigarette smoking among participants was 27.1 percent. Also, there were significant differences between the average score of efficacy perception and age, gender, his-tory of drug abuse and dwelling of students (P<0.05. Efficacy and threat perception both predicted student cigarette smoking. Conclusions: Cognitive mediating process of threat perception was a more powerful predictor of cigarette smoking as an unsafe behavior. Therefore, increasing self efficacy and response efficacy of university students aimed at facilitating the acceptance of safe behavior could be note-worthy as a principle in education. (Sci J Hamadan Univ Med Sci 2014; 21 (1:58-65

  7. Implementation science: a role for parallel dual processing models of reasoning?

    Directory of Open Access Journals (Sweden)

    Phillips Paddy A

    2006-05-01

    Full Text Available Abstract Background A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Discussion Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence and cognitive processing (e.g., thinking styles influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of

  8. Implementation science: a role for parallel dual processing models of reasoning?

    Science.gov (United States)

    Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

    2006-05-25

    A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important considerations in any

  9. The parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture

    International Nuclear Information System (INIS)

    Wang Shi; Kang Kejun; Wang Jingjin

    1996-01-01

    Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture, hardware and software design of PIRS-4 (4-processor Parallel Image Reconstruction System), which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes the structure and components of the system, the design of crossbar switch and details of control model, the description of RPBP image reconstruction, the choice of OS (Operate System) and language, the principle of imitating EMS, direct memory R/W of float and programming in the protect model. Finally, the test results are given

  10. Reconstruction for Time-Domain In Vivo EPR 3D Multigradient Oximetric Imaging—A Parallel Processing Perspective

    Directory of Open Access Journals (Sweden)

    Christopher D. Dharmaraj

    2009-01-01

    Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  11. Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

    Science.gov (United States)

    Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  12. Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

    Energy Technology Data Exchange (ETDEWEB)

    Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

    2017-06-15

    Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.

  13. The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study.

    Science.gov (United States)

    Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W

    2016-01-01

    The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.

  14. The Processing of Somatosensory Information shifts from an early parallel into a serial processing mode: a combined fMRI/MEG study.

    Directory of Open Access Journals (Sweden)

    Carsten Michael Klingner

    2016-12-01

    Full Text Available The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG data collected during sustained (260 ms tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII receives information regarding a new stimulus in parallel with the primary somatosensory area (SI, whereas later processing in the SII is dominated by the preprocessed input from the SI.

  15. Parallel-hierarchical processing and classification of laser beam profile images based on the GPU-oriented architecture

    Science.gov (United States)

    Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan

    2017-08-01

    The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.

  16. PARALLEL PROCESSING OF BIG POINT CLOUDS USING Z-ORDER-BASED PARTITIONING

    Directory of Open Access Journals (Sweden)

    C. Alis

    2016-06-01

    Full Text Available As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112 is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest

  17. Parallel Processing of Big Point Clouds Using Z-Order Partitioning

    Science.gov (United States)

    Alis, C.; Boehm, J.; Liu, K.

    2016-06-01

    As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112) is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest neighbour algorithm

  18. Prediction of Adequate Prenatal Care Utilization Based on the Extended Parallel Process Model.

    Science.gov (United States)

    Hajian, Sepideh; Imani, Fatemeh; Riazi, Hedyeh; Salmani, Fatemeh

    2017-10-01

    Pregnancy complications are one of the major public health concerns. One of the main causes of preventable complications is the absence of or inadequate provision of prenatal care. The present study was conducted to investigate whether Extended Parallel Process Model's constructs can predict the utilization of prenatal care services. The present longitudinal prospective study was conducted on 192 pregnant women selected through the multi-stage sampling of health facilities in Qeshm, Hormozgan province, from April to June 2015. Participants were followed up from the first half of pregnancy until their childbirth to assess adequate or inadequate/non-utilization of prenatal care services. Data were collected using the structured Risk Behavior Diagnosis Scale. The analysis of the data was carried out in SPSS-22 using one-way ANOVA, linear regression and logistic regression analysis. The level of significance was set at 0.05. Totally, 178 pregnant women with a mean age of 25.31±5.42 completed the study. Perceived self-efficacy (OR=25.23; Pprenatal care. Husband's occupation in the labor market (OR=0.43; P=0.02), unwanted pregnancy (OR=0.352; Pcare for the minors or elderly at home (OR=0.35; P=0.045) were associated with lower odds of receiving prenatal care. The model showed that when perceived efficacy of the prenatal care services overcame the perceived threat, the likelihood of prenatal care usage will increase. This study identified some modifiable factors associated with prenatal care usage by women, providing key targets for appropriate clinical interventions.

  19. Parallel processing streams for motor output and sensory prediction during action preparation.

    Science.gov (United States)

    Stenner, Max-Philipp; Bauer, Markus; Heinze, Hans-Jochen; Haggard, Patrick; Dolan, Raymond J

    2015-03-15

    Sensory consequences of one's own actions are perceived as less intense than identical, externally generated stimuli. This is generally taken as evidence for sensory prediction of action consequences. Accordingly, recent theoretical models explain this attenuation by an anticipatory modulation of sensory processing prior to stimulus onset (Roussel et al. 2013) or even action execution (Brown et al. 2013). Experimentally, prestimulus changes that occur in anticipation of self-generated sensations are difficult to disentangle from more general effects of stimulus expectation, attention and task load (performing an action). Here, we show that an established manipulation of subjective agency over a stimulus leads to a predictive modulation in sensory cortex that is independent of these factors. We recorded magnetoencephalography while subjects performed a simple action with either hand and judged the loudness of a tone caused by the action. Effector selection was manipulated by subliminal motor priming. Compatible priming is known to enhance a subjective experience of agency over a consequent stimulus (Chambon and Haggard 2012). In line with this effect on subjective agency, we found stronger sensory attenuation when the action that caused the tone was compatibly primed. This perceptual effect was reflected in a transient phase-locked signal in auditory cortex before stimulus onset and motor execution. Interestingly, this sensory signal emerged at a time when the hemispheric lateralization of motor signals in M1 indicated ongoing effector selection. Our findings confirm theoretical predictions of a sensory modulation prior to self-generated sensations and support the idea that a sensory prediction is generated in parallel to motor output (Walsh and Haggard 2010), before an efference copy becomes available. Copyright © 2015 the American Physiological Society.

  20. Parallel processing of information about location in the amygdala, entorhinal cortex and hippocampus.

    Science.gov (United States)

    Gaskin, Stephane; White, Norman M

    2013-11-01

    The conditioned cue preference paradigm was used to study how rats use extra-maze cues to discriminate between 2 adjacent arms on an 8-arm radial maze, a situation in which most of the same cues can be seen from both arms but only one arm contains food. Since the food-restricted rats eat while passively confined on the food-paired arm no responses are reinforced, so the discrimination is due to Pavlovian stimulus-reward (or outcome) learning. Consistent with other evidence that rats must move around in an environment to acquire a spatial map, we found that learning the adjacent arms CCP (ACCP) required a minimum amount of active exploration of the maze with no reinforcers present prior to passive pairing of the extra-maze cues with the food reinforcer, an instance of latent learning. Temporary inactivation of the hippocampus during the pre-exposure sessions had no effect on ACCP learning, confirming other evidence that the hippocampus is not involved in latent learning. A series of experiments indentified a circuit involving fimbria-fornix and dorsal entorhinal cortex as the neural basis of latent learning in this situation. In contrast, temporary inactivation of the entorhinal cortex or hippocampus during passive training or during testing blocked ACCP learning and expression, respectively, suggesting that these two structures co-operate in using spatial information to learn the location of food on the maze during passive pairing and to express this combined information during testing. In parallel with these processes we found that the amygdala processes information leading to an equal tendency to enter both adjacent arms (even though only one was paired with food) suggesting that the stimulus information available to this structure is not sufficiently precise to discriminate between the ambiguous cues visible from the adjacent arms. Expression of the ACCP in normal rats depends on hippocampus-based learning to avoid the unpaired arm which competes with the

  1. Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

    Science.gov (United States)

    Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

    2018-01-01

    This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

  2. Belief–logic conflict resolution in syllogistic reasoning: Inspection-time evidence for a parallel process model

    OpenAIRE

    Stupple, Edward J.N; Ball, Linden

    2008-01-01

    An experiment is reported examining dual-process models of belief bias in syllogistic reasoning using a problem complexity manipulation and an inspection-time method to monitor processing latencies for premises and conclusions. Endorsement rates indicated increased belief bias on complex problems, a finding that runs counter to the “belief-first” selective scrutiny model, but which is consistent with other theories, including “reasoning-first” and “parallel-process” models. Inspection-time da...

  3. Analysis of parameters for technological equipment of parallel kinematics based on rods of variable length for processing accuracy assurance

    Science.gov (United States)

    Koltsov, A. G.; Shamutdinov, A. H.; Blokhin, D. A.; Krivonos, E. V.

    2018-01-01

    A new classification of parallel kinematics mechanisms on symmetry coefficient, being proportional to mechanism stiffness and accuracy of the processing product using the technological equipment under study, is proposed. A new version of the Stewart platform with a high symmetry coefficient is presented for analysis. The workspace of the mechanism under study is described, this space being a complex solid figure. The workspace end points are reached by the center of the mobile platform which moves in parallel related to the base plate. Parameters affecting the processing accuracy, namely the static and dynamic stiffness, natural vibration frequencies are determined. The capability assessment of the mechanism operation under various loads, taking into account resonance phenomena at different points of the workspace, was conducted. The study proved that stiffness and therefore, processing accuracy with the use of the above mentioned mechanisms are comparable with the stiffness and accuracy of medium-sized series-produced machines.

  4. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  5. The application of image processing in the measurement for three-light-axis parallelity of laser ranger

    Science.gov (United States)

    Wang, Yang; Wang, Qianqian

    2008-12-01

    When laser ranger is transported or used in field operations, the transmitting axis, receiving axis and aiming axis may be not parallel. The nonparallelism of the three-light-axis will affect the range-measuring ability or make laser ranger not be operated exactly. So testing and adjusting the three-light-axis parallelity in the production and maintenance of laser ranger is important to ensure using laser ranger reliably. The paper proposes a new measurement method using digital image processing based on the comparison of some common measurement methods for the three-light-axis parallelity. It uses large aperture off-axis paraboloid reflector to get the images of laser spot and white light cross line, and then process the images on LabVIEW platform. The center of white light cross line can be achieved by the matching arithmetic in LABVIEW DLL. And the center of laser spot can be achieved by gradation transformation, binarization and area filter in turn. The software system can set CCD, detect the off-axis paraboloid reflector, measure the parallelity of transmitting axis and aiming axis and control the attenuation device. The hardware system selects SAA7111A, a programmable vedio decoding chip, to perform A/D conversion. FIFO (first-in first-out) is selected as buffer.USB bus is used to transmit data to PC. The three-light-axis parallelity can be achieved according to the position bias between them. The device based on this method has been already used. The application proves this method has high precision, speediness and automatization.

  6. Distributed and cloud computing from parallel processing to the Internet of Things

    CERN Document Server

    Hwang, Kai; Fox, Geoffrey C

    2012-01-01

    Distributed and Cloud Computing, named a 2012 Outstanding Academic Title by the American Library Association's Choice publication, explains how to create high-performance, scalable, reliable systems, exposing the design principles, architecture, and innovative applications of parallel, distributed, and cloud computing systems. Starting with an overview of modern distributed models, the book provides comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization Clustered systems for resear

  7. A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 1

    International Nuclear Information System (INIS)

    Ernenwein, R.; Rohmer, M.M.; Benard, M.

    1990-01-01

    We present a program system for ab initio molecular orbital calculations on vector and parallel computers. The present article is devoted to the computation of one- and two-electron integrals over contracted Gaussian basis sets involving s-, p-, d- and f-type functions. The McMurchie and Davidson (MMD) algorithm has been implemented and parallelized by distributing over a limited number of logical tasks the calculation of the 55 relevant classes of integrals. All sections of the MMD algorithm have been efficiently vectorized, leading to a scalar/vector ratio of 5.8. Different algorithms are proposed and compared for an optimal vectorization of the contraction of the 'intermediate integrals' generated by the MMD formalism. Advantage is taken of the dynamic storage allocation for tuning the length of the vector loops (i.e. the size of the vectorization buffer) as a function of (i) the total memory available for the job, (ii) the number of logical tasks defined by the user (≤13), and (iii) the storage requested by each specific class of integrals. Test calculations carried out on a CRAY-2 computer show that the average number of finite integrals computed over a (s, p, d, f) CGTO basis set is about 1180000 per second and per processor. The combination of vectorization and parallelism on this 4-processor machine reduces the CPU time by a factor larger than 20 with respect to the scalar and sequential performance. (orig.)

  8. Parallel processing for a 1-D time-dependent solution to impurity rate equations for fusion plasma simulations

    International Nuclear Information System (INIS)

    Veerasingam, R.

    1990-01-01

    In fusion plasmas impurities such as carbon, oxygen or nickel can contaminate the plasma and cause degradation of the performance of a fusion device through radiation. However, impurities can also be used as diagnostics to obtain information about a plasma through spectroscopic experiments which can then be used in plasma modeling and simulations. In the past, serial algorithms have been described for either the time dependent or steady state problem. In this paper, we describe a parallel procedure adopted to solve the time-dependent problem. It can be shown that for the steady state problem a parallel procedure would not be a useful application of parallelization because a few seconds of the Central Processing Unit time on a CRAY-XMP or IBM 3090/600S would suffice to obtain the solution, while this is not the case for the time-dependent problem. In order to study the effects of low Z and high Z impurities on the final state of a plasma, time-dependent solutions are necessary. For purposes of diagnostics and comparisons with experiments, a fast turn around time of the simulations would be advantageous. We have implemented a parallel algorithm on and IBM 3090/600S and tested its performance for a typical set of fusion plasma parameters. 4 refs., 1 tab

  9. Real-time data acquisition and parallel data processing solution for TJ-II Bolometer arrays diagnostic

    Energy Technology Data Exchange (ETDEWEB)

    Barrera, E. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain)]. E-mail: eduardo.barrera@upm.es; Ruiz, M. [Grupo de Investigacion en Instrumentacion y Acustica Aplicada, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Lopez, S. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Machon, D. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain); Ochando, M. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain)

    2006-07-15

    Maps of local plasma emissivity of TJ-II plasmas are determined using three-array cameras of silicon photodiodes (AXUV type from IRD). They have assigned the top and side ports of the same sector of the vacuum vessel. Each array consists of 20 unfiltered detectors. The signals from each of these detectors are the inputs to an iterative algorithm of tomographic reconstruction. Currently, these signals are acquired by a PXI standard system at approximately 50 kS/s, with 12 bits of resolution and are stored for off-line processing. A 0.5 s discharge generates 3 Mbytes of raw data. The algorithm's load exceeds the CPU capacity of the PXI system's controller in a continuous mode, making unfeasible to process the samples in parallel with their acquisition in a PXI standard system. A new architecture model has been developed, making possible to add one or several processing cards to a standard PXI system. With this model, it is possible to define how to distribute, in real-time, the data from all acquired signals in the system among the processing cards and the PXI controller. This way, by distributing the data processing among the system controller and two processing cards, the data processing can be done in parallel with the acquisition. Hence, this system configuration would be able to measure even in long pulse devices.

  10. The position dependent influence that sensitivity correction processing gives the signal-to-noise ratio measurement in parallel imaging

    International Nuclear Information System (INIS)

    Murakami, Koichi; Yoshida, Koji; Yanagimoto, Shinichi

    2012-01-01

    We studied the position dependent influence that sensitivity correction processing gave the signal-to-noise ratio (SNR) measurement of parallel imaging (PI). Sensitivity correction processing that referred to the sensitivity distribution of the body coil improved regional uniformity more than the sensitivity uniformity correction filter with a fixed correction factor. In addition, the position dependent influence to give the SNR measurement in PI was different from the sensitivity correction processing. Therefore, if we divide SNR of the sensitivity correction processing image by SNR of the original image in each pixel and calculate SNR ratio, we can show the position dependent influence that sensitivity correction processing gives the SNR measurement in PI. It is with an index of the sensitivity correction processing precision. (author)

  11. Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit

    Directory of Open Access Journals (Sweden)

    T. Stefański

    2014-12-01

    Full Text Available Parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD with the use of time-domain integral equation methods. The developed implementation can be applied to simulations of antenna characteristics. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of parallel DGF-FDTD. The efficiency of parallel computations was investigated as a function of the number of current elements in the FDTD grid. Although the developed method does not apply the fast Fourier transform for convolution computations, advantages stemming from the application of DGF-FDTD instead of FDTD can be demonstrated for one-dimensional wire antennas when simulation results are post-processed by the near-to-far-field transformation.

  12. High-throughput fabrication of micrometer-sized compound parabolic mirror arrays by using parallel laser direct-write processing

    International Nuclear Information System (INIS)

    Yan, Wensheng; Gu, Min; Cumming, Benjamin P

    2015-01-01

    Micrometer-sized parabolic mirror arrays have significant applications in both light emitting diodes and solar cells. However, low fabrication throughput has been identified as major obstacle for the mirror arrays towards large-scale applications due to the serial nature of the conventional method. Here, the mirror arrays are fabricated by using a parallel laser direct-write processing, which addresses this barrier. In addition, it is demonstrated that the parallel writing is able to fabricate complex arrays besides simple arrays and thus offers wider applications. Optical measurements show that each single mirror confines the full-width at half-maximum value to as small as 17.8 μm at the height of 150 μm whilst providing a transmittance of up to 68.3% at a wavelength of 633 nm in good agreement with the calculation values. (paper)

  13. A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations.

    Directory of Open Access Journals (Sweden)

    ThienLuan Ho

    Full Text Available Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs. In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively.

  14. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Science.gov (United States)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  15. Building an Elastic Parallel OGC Web Processing Service on a Cloud-Based Cluster: A Case Study of Remote Sensing Data Processing Service

    Directory of Open Access Journals (Sweden)

    Xicheng Tan

    2015-10-01

    Full Text Available Since the Open Geospatial Consortium (OGC proposed the geospatial Web Processing Service (WPS, standard OGC Web Service (OWS-based geospatial processing has become the major type of distributed geospatial application. However, improving the performance and sustainability of the distributed geospatial applications has become the dominant challenge for OWSs. This paper presents the construction of an elastic parallel OGC WPS service on a cloud-based cluster and the designs of a high-performance, cloud-based WPS service architecture, the scalability scheme of the cloud, and the algorithm of the elastic parallel geoprocessing. Experiments of the remote sensing data processing service demonstrate that our proposed method can provide a higher-performance WPS service that uses less computing resources. Our proposed method can also help institutions reduce hardware costs, raise the rate of hardware usage, and conserve energy, which is important in building green and sustainable geospatial services or applications.

  16. Synthesis of a parallel data stream processor from data flow process networks

    NARCIS (Netherlands)

    Zissulescu-Ianculescu, Claudiu

    2008-01-01

    In this talk, we address the problem of synthesizing Process Network specifications to FPGA execution platforms. The process networks we consider are special cases of Kahn Process Networks. We call them COMPAAN Data Flow Process Networks (CDFPN) because they are provided by a translator called the

  17. A New Track Reconstruction Algorithm suitable for Parallel Processing based on Hit Triplets and Broken Lines

    Directory of Open Access Journals (Sweden)

    Schöning André

    2016-01-01

    Full Text Available Track reconstruction in high track multiplicity environments at current and future high rate particle physics experiments is a big challenge and very time consuming. The search for track seeds and the fitting of track candidates are usually the most time consuming steps in the track reconstruction. Here, a new and fast track reconstruction method based on hit triplets is proposed which exploits a three-dimensional fit model including multiple scattering and hit uncertainties from the very start, including the search for track seeds. The hit triplet based reconstruction method assumes a homogeneous magnetic field which allows to give an analytical solutions for the triplet fit result. This method is highly parallelizable, needs fewer operations than other standard track reconstruction methods and is therefore ideal for the implementation on parallel computing architectures. The proposed track reconstruction algorithm has been studied in the context of the Mu3e-experiment and a typical LHC experiment.

  18. Global restructuring of the CPM-2 transport algorithm for vector and parallel processing

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.

    1989-01-01

    The CPM-2 code is an assembly transport code based on the collision probability (CP) method. It can in principle be applied to global reactor problems, but its excessive computational demands prevent this application. Therefore, a new transport algorithm for CPM-2 has been developed for vector-parallel architectures, which has resulted in an overall factor of 20 speedup (wall clock) on the IBM 3090-600E. This paper presents the detailed results of this effort as well as a brief description of ongoing effort to remove some of the modeling limitations in CPM-2 that inhibit its use for global applications, such as the use of the pure CP treatment and the assumption of isotropic scattering

  19. Parallel Processing and Bio-inspired Computing for Biomedical Image Registration

    Directory of Open Access Journals (Sweden)

    Silviu Ioan Bejinariu

    2014-07-01

    Full Text Available Image Registration (IR is an optimization problem computing optimal parameters of a geometric transform used to overlay one or more source images to a given model by maximizing a similarity measure. In this paper the use of bio-inspired optimization algorithms in image registration is analyzed. Results obtained by means of three different algorithms are compared: Bacterial Foraging Optimization Algorithm (BFOA, Genetic Algorithm (GA and Clonal Selection Algorithm (CSA. Depending on the images type, the registration may be: area based, which is slow but more precise, and features based, which is faster. In this paper a feature based approach based on the Scale Invariant Feature Transform (SIFT is proposed. Finally, results obtained using sequential and parallel implementations on multi-core systems for area based and features based image registration are compared.

  20. Vectorization of KENO IV code and an estimate of vector-parallel processing

    International Nuclear Information System (INIS)

    Asai, Kiyoshi; Higuchi, Kenji; Katakura, Jun-ichi; Kurita, Yutaka.

    1986-10-01

    The multi-group criticality safety code KENO IV has been vectorized and tested on FACOM VP-100 vector processor. At first the vectorized KENO IV on a scalar processor became slower than the original one by a factor of 1.4 because of the overhead introduced by the vectorization. Making modifications of algorithms and techniques for vectorization, the vectorized version has become faster than the original one by a factor of 1.4 and 3.0 on the vector processor for sample problems of complex and simple geometries, respectively. For further speedup of the code, some improvements on compiler and hardware, especially on addition of Monte Carlo pipelines to the vector processor, are discussed. Finally a pipelined parallel processor system is proposed and its performance is estimated. (author)

  1. Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash

    Directory of Open Access Journals (Sweden)

    Mathew G. Pelletier

    2008-02-01

    Full Text Available One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU as an alternative to thePC’s traditional use of the central processing unit (CPU. The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit “GPU”, for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC’s central processing

  2. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    CERN Document Server

    Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

    2015-01-01

    AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...

  3. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    CERN Document Server

    Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

    2015-01-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...

  4. Research on Gear Shifting Process without Disengaging Clutch for a Parallel Hybrid Electric Vehicle Equipped with AMT

    Directory of Open Access Journals (Sweden)

    Hui-Long Yu

    2014-01-01

    Full Text Available Dynamic models of a single-shaft parallel hybrid electric vehicle (HEV equipped with automated mechanical transmission (AMT were described in different working stages during a gear shifting process without disengaging clutch. Parameters affecting the gear shifting time, components life, and gear shifting jerk in different transient states during a gear shifting process were deeply analyzed. The mathematical models considering the detailed synchronizer working process which can explain the gear shifting failure, long time gear shifting, and frequent synchronizer failure phenomenon in HEV were derived. Dynamic coordinated control strategy of the engine, motor, and actuators in different transient states considering the detailed working stages of synchronizer in a gear shifting process of a HEV is for the first time innovatively proposed according to the state of art references. Bench test and real road test results show that the proposed control strategy can improve the gear shifting quality in all its evaluation indexes significantly.

  5. A review of advanced small-scale parallel bioreactor technology for accelerated process development: current state and future need.

    Science.gov (United States)

    Bareither, Rachel; Pollard, David

    2011-01-01

    The pharmaceutical and biotech industries face continued pressure to reduce development costs and accelerate process development. This challenge occurs alongside the need for increased upstream experimentation to support quality by design initiatives and the pursuit of predictive models from systems biology. A small scale system enabling multiple reactions in parallel (n ≥ 20), with automated sampling and integrated to purification, would provide significant improvement (four to fivefold) to development timelines. State of the art attempts to pursue high throughput process development include shake flasks, microfluidic reactors, microtiter plates and small-scale stirred reactors. The limitations of these systems are compared to desired criteria to mimic large scale commercial processes. The comparison shows that significant technological improvement is still required to provide automated solutions that can speed upstream process development. Copyright © 2010 American Institute of Chemical Engineers (AIChE).

  6. The QUANTUM I project: Parallel processing in a local area network work dedicated to ab initio calculation of potential hypersurfaces

    International Nuclear Information System (INIS)

    Lavenir, E.; Pic, J.M.; Alibran, P.; Leclercq, J.M.

    1987-01-01

    The QUANTUM I project is a three-stage device. The stages are respectively dedicated to particular steps of the ab initio determination of a point on the hypersurface. The first stage deals with the computation of the integrals between the basis functions, the second with the S.C.F. (or M.C.S.C.F.) process and the third with the C.I treatment. Each step is developed in terms of parallel mode (M.I.M.D.), the whole device working following a pipeline mode: the three stages works simultaneously for different points

  7. Development and control towards a parallel water hydraulic weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Pessi, Pekka; Kilkki, Juha; Jones, Lawrence

    2005-01-01

    This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel (VV), consisting of a five degree-of-freedom parallel mechanism, mounted on a carriage driven by two electric motors on a rack. The kinematic design of the robot has been optimised for ITER access and a hydraulically actuated pre-prototype built. A hybrid controller is designed for the robot, including position, speed and pressure feedback loops to achieve high accuracy and high dynamic performances. Finally, the experimental tests are given and discussed

  8. The Application of Paired Parallel Filters for Ultra-Wideband Signal Processing

    Directory of Open Access Journals (Sweden)

    S. L. Chernyshev

    2015-01-01

    Full Text Available The paper considers a unit in which the parallel filters on regular lines are pair-attached. This connection allows to reduce a side line impedance at the point of connection. At the same time these lines become narrow, and the possibility to excite higher modes in the joint reduces.Consider the scattering matrix of four identical lines connection. Then find the scattering matrix of connection in which two side lines are connected with filters. Particular cases of the reflection coefficients of different filters are considered. It is shown that only in the case of identical filters there remained a linear relationship between the input filter coefficients of reflection and transmission coefficient of the unit. It facilitates the solution of the problem of synthesis. Restrictions on the transfer coefficient are found. In transition to the time domain impulse response of connection under consideration and the expression for the synthesis were defined. The paper considers an example of implementation of the matched filtering in this connection. In this case, the output signal is a half-sum of the input signal and their autocorrelation function.

  9. Cost-effective parallel optical interconnection module based on fully passive-alignment process

    Science.gov (United States)

    Son, Dong Hoon; Heo, Young Soon; Park, Hyoung-Jun; Kang, Hyun Seo; Kim, Sung Chang

    2017-11-01

    In optical interconnection technology, high-speed and large data transitions with low error rate and cost reduction are key issues for the upcoming 8K media era. The researchers present notable types of optical manufacturing structures of a four-channel parallel optical module by fully passive alignment, which are able to reduce manufacturing time and cost. Each of the components, such as vertical-cavity surface laser/positive-intrinsic negative-photodiode array, microlens array, fiber array, and receiver (RX)/transmitter (TX) integrated circuit, is integrated successfully using flip-chip bonding, die bonding, and passive alignment with a microscope. Clear eye diagrams are obtained by 25.78-Gb/s (for TX) and 25.7-Gb/s (for RX) nonreturn-to-zero signals of pseudorandom binary sequence with a pattern length of 231 to 1. The measured responsivity and minimum sensitivity of the RX are about 0.5 A/W and ≤-6.5 dBm at a bit error rate (BER) of 10-12, respectively. The optical power margin at a BER of 10-12 is 7.5 dB, and cross talk by the adjacent channel is ≤1 dB.

  10. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  11. Massively Parallel Geostatistical Inversion of Coupled Processes in Heterogeneous Porous Media

    Science.gov (United States)

    Ngo, A.; Schwede, R. L.; Li, W.; Bastian, P.; Ippisch, O.; Cirpka, O. A.

    2012-04-01

    another level of parallelization has been added.

  12. New domain for image analysis: VLSI circuits testing, with Romuald, specialized in parallel image processing

    Energy Technology Data Exchange (ETDEWEB)

    Rubat Du Merac, C; Jutier, P; Laurent, J; Courtois, B

    1983-07-01

    This paper describes some aspects of specifying, designing and evaluating a specialized machine, Romuald, for the capture, coding, and processing of video and scanning electron microscope (SEM) pictures. First the authors present the functional organization of the process unit of romuald and its hardware, giving details of its behaviour. Then they study the capture and display unit which, thanks to its flexibility, enables SEM images coding. Finally, they describe an application which is now being developed in their laboratory: testing VLSI circuits with new methods: sem+voltage contrast and image processing. 15 references.

  13. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  14. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    Science.gov (United States)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  15. The evolution of concepts of vestibular peripheral information processing: toward the dynamic, adaptive, parallel processing macular model

    Science.gov (United States)

    Ross, Muriel D.

    2003-01-01

    In a letter to Robert Hooke, written on 5 February, 1675, Isaac Newton wrote "If I have seen further than certain other men it is by standing upon the shoulders of giants." In his context, Newton was referring to the work of Galileo and Kepler, who preceded him. However, every field has its own giants, those men and women who went before us and, often with few tools at their disposal, uncovered the facts that enabled later researchers to advance knowledge in a particular area. This review traces the history of the evolution of views from early giants in the field of vestibular research to modern concepts of vestibular organ organization and function. Emphasis will be placed on the mammalian maculae as peripheral processors of linear accelerations acting on the head. This review shows that early, correct findings were sometimes unfortunately disregarded, impeding later investigations into the structure and function of the vestibular organs. The central themes are that the macular organs are highly complex, dynamic, adaptive, distributed parallel processors of information, and that historical references can help us to understand our own place in advancing knowledge about their complicated structure and functions.

  16. Vortex particle method in parallel computations on graphical processing units used in study of the evolution of vortex structures

    International Nuclear Information System (INIS)

    Kudela, Henryk; Kosior, Andrzej

    2014-01-01

    Understanding the dynamics and the mutual interaction among various types of vortical motions is a key ingredient in clarifying and controlling fluid motion. In the paper several different cases related to vortex tube interactions are presented. Due to problems with very long computation times on the single processor, the vortex-in-cell (VIC) method is implemented on the multicore architecture of a graphics processing unit (GPU). Numerical results of leapfrogging of two vortex rings for inviscid and viscous fluid are presented as test cases for the new multi-GPU implementation of the VIC method. Influence of the Reynolds number on the reconnection process is shown for two examples: antiparallel vortex tubes and orthogonally offset vortex tubes. Our aim is to show the great potential of the VIC method for solutions of three-dimensional flow problems and that the VIC method is very well suited for parallel computation. (paper)

  17. Minimizing makespan in a two-stage flow shop with parallel batch-processing machines and re-entrant jobs

    Science.gov (United States)

    Huang, J. D.; Liu, J. J.; Chen, Q. X.; Mao, N.

    2017-06-01

    Against a background of heat-treatment operations in mould manufacturing, a two-stage flow-shop scheduling problem is described for minimizing makespan with parallel batch-processing machines and re-entrant jobs. The weights and release dates of jobs are non-identical, but job processing times are equal. A mixed-integer linear programming model is developed and tested with small-scale scenarios. Given that the problem is NP hard, three heuristic construction methods with polynomial complexity are proposed. The worst case of the new constructive heuristic is analysed in detail. A method for computing lower bounds is proposed to test heuristic performance. Heuristic efficiency is tested with sets of scenarios. Compared with the two improved heuristics, the performance of the new constructive heuristic is superior.

  18. Novel encoding and updating of positional, or directional, spatial cues are processed by distinct hippocampal subfields: Evidence for parallel information processing and the "what" stream.

    Science.gov (United States)

    Hoang, Thu-Huong; Aliane, Verena; Manahan-Vaughan, Denise

    2018-05-01

    The specific roles of hippocampal subfields in spatial information processing and encoding are, as yet, unclear. The parallel map theory postulates that whereas the CA1 processes discrete environmental features (positional cues used to generate a "sketch map"), the dentate gyrus (DG) processes large navigation-relevant landmarks (directional cues used to generate a "bearing map"). Additionally, the two-streams hypothesis suggests that hippocampal subfields engage in differentiated processing of information from the "where" and the "what" streams. We investigated these hypotheses by analyzing the effect of exploration of discrete "positional" features and large "directional" spatial landmarks on hippocampal neuronal activity in rats. As an indicator of neuronal activity we measured the mRNA induction of the immediate early genes (IEGs), Arc and Homer1a. We observed an increase of this IEG mRNA in CA1 neurons of the distal neuronal compartment and in proximal CA3, after novel spatial exploration of discrete positional cues, whereas novel exploration of directional cues led to increases in IEG mRNA in the lower blade of the DG and in proximal CA3. Strikingly, the CA1 did not respond to directional cues and the DG did not respond to positional cues. Our data provide evidence for both the parallel map theory and the two-streams hypothesis and suggest a precise compartmentalization of the encoding and processing of "what" and "where" information occurs within the hippocampal subfields. © 2018 The Authors. Hippocampus Published by Wiley Periodicals, Inc.

  19. Embedded parallel processing based ground control systems for small satellite telemetry

    Science.gov (United States)

    Forman, Michael L.; Hazra, Tushar K.; Troendly, Gregory M.; Nickum, William G.

    1994-01-01

    The use of networked terminals which utilize embedded processing techniques results in totally integrated, flexible, high speed, reliable, and scalable systems suitable for telemetry and data processing applications such as mission operations centers (MOC). Synergies of these terminals, coupled with the capability of terminal to receive incoming data, allow the viewing of any defined display by any terminal from the start of data acquisition. There is no single point of failure (other than with network input) such as exists with configurations where all input data goes through a single front end processor and then to a serial string of workstations. Missions dedicated to NASA's ozone measurements program utilize the methodologies which are discussed, and result in a multimission configuration of low cost, scalable hardware and software which can be run by one flight operations team with low risk.

  20. «Concurrency» in M-L-Parallel Semi-Markov Process

    Directory of Open Access Journals (Sweden)

    Larkin Eugene

    2017-01-01

    Full Text Available This article investigates the functioning of a swarm of robots, each of which receives instructions from the external human operator and autonomously executes them. An abstract model of functioning of a robot, a group of robots and multiple groups of robots was obtained using the notion of semi-Markov process. The concepts of aggregated initial and aggregated absorbing states were introduced. Correspondences for calculation of time parameters of concurrency were obtained.

  1. Parallel effects of processing fluency and positive affect on familiarity-based recognition decisions for faces

    Directory of Open Access Journals (Sweden)

    Devin eDuke

    2014-04-01

    Full Text Available According to attribution models of familiarity assessment, people can use a heuristic in recognition-memory decisions, in which they attribute the subjective ease of processing of a memory probe to a prior encounter with the stimulus in question. Research in social cognition suggests that experienced positive affect may be the proximal cue that signals fluency in various experimental contexts. In the present study, we compared the effects of positive affect and fluency on recognition-memory judgments for faces with neutral emotional expression. We predicted that if positive affect is indeed the critical cue that signals processing fluency at retrieval, then its manipulation should produce effects that closely mirror those produced by manipulations of processing fluency. In two experiments, we employed a masked-priming procedure in combination with a Remember-Know paradigm that aimed to separate familiarity- from recollection-based memory decisions. In addition, participants performed a prime-discrimination task that allowed us to take inter-individual differences in prime awareness into account. We found highly similar effects of our priming manipulations of processing fluency and of positive affect. In both cases, the critical effect was specific to familiarity-based recognition responses. Moreover, in both experiments it was reflected in a shift towards a more liberal response bias, rather than in changed discrimination. Finally, in both experiments, the effect was found to be related to prime awareness; it was present only in participants who reported a lack of such awareness on the prime-discrimination task. These findings add to a growing body of evidence that points not only to a role of fluency, but also of positive affect in familiarity assessment. As such they are consistent with the idea that fluency itself may be hedonically marked.

  2. Parallel effects of processing fluency and positive affect on familiarity-based recognition decisions for faces.

    Science.gov (United States)

    Duke, Devin; Fiacconi, Chris M; Köhler, Stefan

    2014-01-01

    According to attribution models of familiarity assessment, people can use a heuristic in recognition-memory decisions, in which they attribute the subjective ease of processing of a memory probe to a prior encounter with the stimulus in question. Research in social cognition suggests that experienced positive affect may be the proximal cue that signals fluency in various experimental contexts. In the present study, we compared the effects of positive affect and fluency on recognition-memory judgments for faces with neutral emotional expression. We predicted that if positive affect is indeed the critical cue that signals processing fluency at retrieval, then its manipulation should produce effects that closely mirror those produced by manipulations of processing fluency. In two experiments, we employed a masked-priming procedure in combination with a Remember-Know (RK) paradigm that aimed to separate familiarity- from recollection-based memory decisions. In addition, participants performed a prime-discrimination task that allowed us to take inter-individual differences in prime awareness into account. We found highly similar effects of our priming manipulations of processing fluency and of positive affect. In both cases, the critical effect was specific to familiarity-based recognition responses. Moreover, in both experiments it was reflected in a shift toward a more liberal response bias, rather than in changed discrimination. Finally, in both experiments, the effect was found to be related to prime awareness; it was present only in participants who reported a lack of such awareness on the prime-discrimination task. These findings add to a growing body of evidence that points not only to a role of fluency, but also of positive affect in familiarity assessment. As such they are consistent with the idea that fluency itself may be hedonically marked.

  3. Multi-target parallel processing approach for gene-to-structure determination of the influenza polymerase PB2 subunit.

    Science.gov (United States)

    Armour, Brianna L; Barnes, Steve R; Moen, Spencer O; Smith, Eric; Raymond, Amy C; Fairman, James W; Stewart, Lance J; Staker, Bart L; Begley, Darren W; Edwards, Thomas E; Lorimer, Donald D

    2013-06-28

    Pandemic outbreaks of highly virulent influenza strains can cause widespread morbidity and mortality in human populations worldwide. In the United States alone, an average of 41,400 deaths and 1.86 million hospitalizations are caused by influenza virus infection each year (1). Point mutations in the polymerase basic protein 2 subunit (PB2) have been linked to the adaptation of the viral infection in humans (2). Findings from such studies have revealed the biological significance of PB2 as a virulence factor, thus highlighting its potential as an antiviral drug target. The structural genomics program put forth by the National Institute of Allergy and Infectious Disease (NIAID) provides funding to Emerald Bio and three other Pacific Northwest institutions that together make up the Seattle Structural Genomics Center for Infectious Disease (SSGCID). The SSGCID is dedicated to providing the scientific community with three-dimensional protein structures of NIAID category A-C pathogens. Making such structural information available to the scientific community serves to accelerate structure-based drug design. Structure-based drug design plays an important role in drug development. Pursuing multiple targets in parallel greatly increases the chance of success for new lead discovery by targeting a pathway or an entire protein family. Emerald Bio has developed a high-throughput, multi-target parallel processing pipeline (MTPP) for gene-to-structure determination to support the consortium. Here we describe the protocols used to determine the structure of the PB2 subunit from four different influenza A strains.

  4. Modular and efficient ozone systems based on massively parallel chemical processing in microchannel plasma arrays: performance and commercialization

    Science.gov (United States)

    Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.

    2017-08-01

    Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.

  5. Real-time hypothesis driven feature extraction on parallel processing architectures

    DEFF Research Database (Denmark)

    Granmo, O.-C.; Jensen, Finn Verner

    2002-01-01

    the problem of higher-order feature-content/feature-feature correlation, causally complexly interacting features are identified through Bayesian network d-separation analysis and combined into joint features. When used on a moderately complex object-tracking case, the technique is able to select...... extraction, which selectively extract relevant features one-by-one, have in some cases achieved real-time performance on single processing element architectures. In this paperwe propose a novel technique which combines the above two approaches. Features are selectively extracted in parallelizable sets...

  6. Operation and performance of a longitudinal damping system using parallel digital signal processing

    International Nuclear Information System (INIS)

    Fox, J.D.; Hindi, H.; Linscott, I.

    1994-06-01

    A programmable longitudinal feedback system based on four AT ampersand T 1610 digital signal processors has been developed as a component of the PEP-II R ampersand D program. This Longitudinal Quick Prototype is a proof of concept for the PEP-II system and implements full speed bunch-by-bunch signal processing for storage rings with bunch spacings of 4 ns. The design implements, via software, a general purpose feedback controller which allows the system to be operated at several accelerator facilities. The system configuration used for tests at the LBL Advanced Light Source is described. Open and closed loop results showing the detection and calculation of feedback signals from bunch motion are presented, and the system is shown to damp coupled-bunch instabilities in the ALS. Use of the system for accelerator diagnostics is illustrated via measurement of injection transients and analysis of open loop bunch motion

  7. Using the extended parallel process model to prevent noise-induced hearing loss among coal miners in Appalachia

    Energy Technology Data Exchange (ETDEWEB)

    Murray-Johnson, L.; Witte, K.; Patel, D.; Orrego, V.; Zuckerman, C.; Maxfield, A.M.; Thimons, E.D. [Ohio State University, Columbus, OH (US)

    2004-12-15

    Occupational noise-induced hearing loss is the second most self-reported occupational illness or injury in the United States. Among coal miners, more than 90% of the population reports a hearing deficit by age 55. In this formative evaluation, focus groups were conducted with coal miners in Appalachia to ascertain whether miners perceive hearing loss as a major health risk and if so, what would motivate the consistent wearing of hearing protection devices (HPDs). The theoretical framework of the Extended Parallel Process Model was used to identify the miners' knowledge, attitudes, beliefs, and current behaviors regarding hearing protection. Focus group participants had strong perceived severity and varying levels of perceived susceptibility to hearing loss. Various barriers significantly reduced the self-efficacy and the response efficacy of using hearing protection.

  8. ISP: an optimal out-of-core image-set processing streaming architecture for parallel heterogeneous systems.

    Science.gov (United States)

    Ha, Linh Khanh; Krüger, Jens; Dihl Comba, João Luiz; Silva, Cláudio T; Joshi, Sarang

    2012-06-01

    Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.

  9. Forgiveness, Stress, and Health: a 5-Week Dynamic Parallel Process Study.

    Science.gov (United States)

    Toussaint, Loren L; Shields, Grant S; Slavich, George M

    2016-10-01

    Psychological stress is a well-known risk factor for poor health, and recent research has suggested that the emotion-focused coping process of forgiveness may help mitigate these effects. To date, however, no studies have examined how levels of forgiveness, stress, and health fluctuate and interrelate over time. We addressed this issue by examining how forgiveness, stress, and mental and physical health symptoms change and relate to one another over 5 weeks. We hypothesized that increases in state levels of forgiveness would be associated with decreases in perceptions of stress, which would in turn be related to decreases in mental and physical health symptoms. A reverse effects model was also tested. We recruited a large, community-based sample of 332 young, middle-aged, and older adults (16-79 years old; M age  = 27.9). Each week for 5 weeks, participants reported on their levels of state forgiveness, perceived stress, and mental and physical health symptoms. Levels of forgiveness, stress, and mental and physical health symptoms each showed significant change and individual variability in change over time. As hypothesized, increases in forgiveness were associated with decreases in stress, which were in turn related to decreases in mental (but not physical) health symptoms (i.e., forgiveness → stress → health). The reverse effects model (i.e., health → stress → forgiveness) provided a relatively poorer fit. This study is the first to provide prospective, longitudinal evidence showing that greater forgiveness is associated with less stress and, in turn, better mental health. Strategies for cultivating forgiveness may thus have beneficial effects on stress and health.

  10. Parallel and convergent processing in grid cell, head-direction cell, boundary cell, and place cell networks.

    Science.gov (United States)

    Brandon, Mark P; Koenig, Julie; Leutgeb, Stefan

    2014-03-01

    The brain is able to construct internal representations that correspond to external spatial coordinates. Such brain maps of the external spatial topography may support a number of cognitive functions, including navigation and memory. The neuronal building block of brain maps are place cells, which are found throughout the hippocampus of rodents and, in a lower proportion, primates. Place cells typically fire in one or few restricted areas of space, and each area where a cell fires can range, along the dorsoventral axis of the hippocampus, from 30 cm to at least several meters. The sensory processing streams that give rise to hippocampal place cells are not fully understood, but substantial progress has been made in characterizing the entorhinal cortex, which is the gateway between neocortical areas and the hippocampus. Entorhinal neurons have diverse spatial firing characteristics, and the different entorhinal cell types converge in the hippocampus to give rise to a single, spatially modulated cell type-the place cell. We therefore suggest that parallel information processing in different classes of cells-as is typically observed at lower levels of sensory processing-continues up into higher level association cortices, including those that provide the inputs to hippocampus. WIREs Cogn Sci 2014, 5:207-219. doi: 10.1002/wcs.1272 Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website. © 2013 John Wiley & Sons, Ltd.

  11. An extension of the extended parallel process model (EPPM) in television health news: the influence of health consciousness on individual message processing and acceptance.

    Science.gov (United States)

    Hong, Hyehyun

    2011-06-01

    The purpose of this study is to examine the role of health consciousness in processing TV news that contains potential health threats and preventive recommendations. Based on the extended parallel process model (Witte, 1992), relationships among health consciousness, perceived severity, perceived susceptibility, perceived response efficacy, perceived self-efficacy, and message acceptance/rejection were hypothesized. Responses collected from 175 participants after viewing four TV health news stories were analyzed using the bootstrapping analysis (Preacher & Hayes, 2008). Results confirmed three mediators (i.e., perceived severity, response efficacy, self-efficacy) in the influence of health consciousness on message acceptance. A negative association found between health consciousness and perceived susceptibility is discussed in relation to characteristics of health conscious individuals and optimistic bias of health risks.

  12. Practical parallel processing

    International Nuclear Information System (INIS)

    Arendt, M.L.

    1986-01-01

    ELXSI, a San Jose based computer company, was founded in January of 1979 for the purpose of developing and marketing a tightly-coupled multiple processor system. After five years ELXSI succeeded in making the first commercial installations at Digicon Geophysical, NASA-Dryden, and Sandia National Laboratories. Since that time over fifty-one systems and ninety-three processors have been installed. The commercial success of the ELXSI system 6400(TM) is due to several significant breakthroughs in computer technology including a system bus operating at 320 million bytes per second, a new Message-Based Operating System, EMBOS (TM), and a new system organization which allows for easy expansion in any dimension without changes to the operating system, the user environment, or the application programs. (Auth.)

  13. Serial and parallel processing in reading: investigating the effects of parafoveal orthographic information on nonisolated word recognition.

    Science.gov (United States)

    Dare, Natasha; Shillcock, Richard

    2013-01-01

    We present a novel lexical decision task and three boundary paradigm eye-tracking experiments that clarify the picture of parallel processing in word recognition in context. First, we show that lexical decision is facilitated by associated letter information to the left and right of the word, with no apparent hemispheric specificity. Second, we show that parafoveal preview of a repeat of word n at word n + 1 facilitates reading of word n relative to a control condition with an unrelated word at word n + 1. Third, using a version of the boundary paradigm that allowed for a regressive eye movement, we show no parafoveal "postview" effect on reading word n of repeating word n at word n - 1. Fourth, we repeat the second experiment but compare the effects of parafoveal previews consisting of a repeated word n with a transposed central bigram (e.g., caot for coat) and a substituted central bigram (e.g., ceit for coat), showing the latter to have a deleterious effect on processing word n, thereby demonstrating that the parafoveal preview effect is at least orthographic and not purely visual.

  14. Combining self-affirmation with the extended parallel process model: the consequences for motivation to eat more fruit and vegetables.

    Science.gov (United States)

    Napper, Lucy E; Harris, Peter R; Klein, William M P

    2014-01-01

    There is potential for fruitful integration of research using the Extended Parallel Process Model (EPPM) with research using Self-affirmation Theory. However, to date no studies have attempted to do this. This article reports an experiment that tests whether (a) the effects of a self-affirmation manipulation add to those of EPPM variables in predicting intentions to improve a health behavior and (b) self-affirmation moderates the relationship between EPPM variables and intentions. Participants (N = 80) were randomized to either a self-affirmation or control condition prior to receiving personally relevant health information about the risks of not eating at least five portions of fruit and vegetables per day. A hierarchical regression model revealed that efficacy, threat × efficacy, self-affirmation, and self-affirmation × efficacy all uniquely contributed to the prediction of intentions to eat at least five portions per day. Self-affirmed participants and those with higher efficacy reported greater motivation to change. Threat predicted intentions at low levels of efficacy, but not at high levels. Efficacy had a stronger relationship with intentions in the nonaffirmed condition than in the self-affirmed condition. The findings indicate that self-affirmation processes can moderate the impact of variables in the EPPM and also add to the variance explained. We argue that there is potential for integration of the two traditions of research, to the benefit of both.

  15. Fully Integrated Linear Single Photon Avalanche Diode (SPAD) Array with Parallel Readout Circuit in a Standard 180 nm CMOS Process

    Science.gov (United States)

    Isaak, S.; Bull, S.; Pitter, M. C.; Harrison, Ian.

    2011-05-01

    This paper reports on the development of a SPAD device and its subsequent use in an actively quenched single photon counting imaging system, and was fabricated in a UMC 0.18 μm CMOS process. A low-doped p- guard ring (t-well layer) encircling the active area to prevent the premature reverse breakdown. The array is a 16×1 parallel output SPAD array, which comprises of an active quenched SPAD circuit in each pixel with the current value being set by an external resistor RRef = 300 kΩ. The SPAD I-V response, ID was found to slowly increase until VBD was reached at excess bias voltage, Ve = 11.03 V, and then rapidly increase due to avalanche multiplication. Digital circuitry to control the SPAD array and perform the necessary data processing was designed in VHDL and implemented on a FPGA chip. At room temperature, the dark count was found to be approximately 13 KHz for most of the 16 SPAD pixels and the dead time was estimated to be 40 ns.

  16. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  17. Acceleration and sensitivity analysis of lattice kinetic Monte Carlo simulations using parallel processing and rate constant rescaling.

    Science.gov (United States)

    Núñez, M; Robie, T; Vlachos, D G

    2017-10-28

    Kinetic Monte Carlo (KMC) simulation provides insights into catalytic reactions unobtainable with either experiments or mean-field microkinetic models. Sensitivity analysis of KMC models assesses the robustness of the predictions to parametric perturbations and identifies rate determining steps in a chemical reaction network. Stiffness in the chemical reaction network, a ubiquitous feature, demands lengthy run times for KMC models and renders efficient sensitivity analysis based on the likelihood ratio method unusable. We address the challenge of efficiently conducting KMC simulations and performing accurate sensitivity analysis in systems with unknown time scales by employing two acceleration techniques: rate constant rescaling and parallel processing. We develop statistical criteria that ensure sufficient sampling of non-equilibrium steady state conditions. Our approach provides the twofold benefit of accelerating the simulation itself and enabling likelihood ratio sensitivity analysis, which provides further speedup relative to finite difference sensitivity analysis. As a result, the likelihood ratio method can be applied to real chemistry. We apply our methodology to the water-gas shift reaction on Pt(111).

  18. Fine-grain Parallel Processing On A Commodity Platform: A Solution For The Atlas Second-level Trigger

    CERN Document Server

    Boosten, M

    2003-01-01

    From 2005 on, CERN expects to have a new accelerator available for experiments: the Large Hadron Collider (LHC), with a circumference of 27 kilometres. The ATLAS detector produces 40 TeraBytes/s of data. Only a fraction of all data is interesting. A computer system, called the trigger, selects the interesting data through real-time data analysis. The trigger consists of three subsequent filtering levels: LVL1, LVL2, and LVL3. LVL1 will be implemented using special-purpose hardware. LVL2 and LVL3 will be implemented using a Network Of Workstations (NOW). A major problem is to make efficient use of the computing power available in each workstation. The major contribution of this designer's project is an infrastructure named MESH. MESH enables CERN to cost- effectively implement the LVL2 trigger. Furthermore, due to the use of commodity technology, MESH enables the LVL2 trigger to be cost-effectively upgraded and supported during its 20 year lifecycle. MESH facilitates efficient parallel processing on PCs interc...

  19. Assessment of Substance Abuse Behaviors in Adolescents’: Integration of Self-Control into Extended Parallel Process Model

    Directory of Open Access Journals (Sweden)

    K Witte

    2005-04-01

    Full Text Available Introduction: An effective preventive health education program on drug abuse can be delivered by applying behavior change theories in a complementary fashion. Methods: The aim of this study was to assess the effectiveness of integrating self-control into Extended Parallel Process Model in drug substance abuse behaviors. A sample of 189 governmental high school students participated in this survey. Information was collected individually by completing researcher designed questionnaire and a urinary rapid immuno-chromatography test for opium and marijuana. Results: The results of the study show that 6.9% of students used drugs (especially opium and marijuana and also peer pressure was determinant factor for using drugs. Moreover the EPPM theoretical variables of perceived severity and perceived self-efficacy with self-control are predictive factors to behavior intention against substance abuse. In this manner, self-control had a significant effect on protective motivation and perceived efficacy. Low self- control was a predictive factor of drug abuse and low self-control students had drug abuse experience. Conclusion: The results of this study suggest that an integration of self-control into EPPM can be effective in expressing and designing primary preventive programs against drug abuse, and assessing abused behavior and deviance behaviors among adolescent population, especially risk seekers

  20. 8051 microcontroller to FPGA and ADC interface design for high speed parallel processing systems – Application in ultrasound scanners

    Directory of Open Access Journals (Sweden)

    J. Jean Rossario Raj

    2016-09-01

    Full Text Available Microcontrollers perform the hardware control in many instruments. Instruments requiring huge data throughput and parallel computing use FPGA’s for data processing. The microcontroller in turn configures the application hardware devices such as FPGA’s, ADC’s and Ethernet chips etc. The interfacing of these devices uses address/data bus interface, serial interface or serial peripheral interface. The choice of the interface depends upon the input/output pins available with different devices, programming ease and proprietary interfaces supported by devices such as ADC’s. The novelty of this paper is to describe the programming logic used for various types of interface scenarios from microcontroller to different programmable devices. The study presented describes the methods and logic flowcharts for different interfaces. The implementation of the interface logics were in prototype hardware for ultrasound scanner. The internal devices were controlled from the graphical user interface in a laptop and the scan results are taken. It is seen that the optimum solution of the hardware design can be achieved by using a common serial interface towards all the devices.

  1. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  2. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  3. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale LNG (liquefied natural gas) plant in skid-mount packages

    International Nuclear Information System (INIS)

    He, Tianbiao; Ju, Yonglin

    2014-01-01

    The utilization of unconventional natural gas is still a great challenge for China due to its distribution locations and small reserves. Thus, liquefying the unconventional natural gas by using small-scale LNG plant in skid-mount packages is a good choice with great economic benefits. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale plant in skid-mount packages has been proposed. It first designs a process configuration. Then, thermodynamic analysis of the process is conducted. Next, an optimization model with genetic algorithm method is developed to optimize the process. Finally, the flexibilities of the process are tested by two different feed gases. In conclusion, the proposed parallel nitrogen expansion liquefaction process can be used in small-scale LNG plant in skid-mount packages with high exergy efficiency and great economic benefits. - Highlights: • A novel design of parallel nitrogen expansion liquefaction process is proposed. • Genetic algorithm is applied to optimize the novel process. • The unit energy consumption of optimized process is 0.5163 kWh/Nm 3 . • The exergy efficiency of the optimized case is 0.3683. • The novel process has a good flexibility for different feed gas conditions

  4. A comparison of parallel dust and fibre measurements of airborne chrysotile asbestos in a large mine and processing factories in the Russian Federation

    NARCIS (Netherlands)

    Feletto, Eleonora; Schonfeld, Sara J; Kovalevskiy, Evgeny V; Bukhtiyarov, Igor V; Kashanskiy, Sergey V; Moissonnier, Monika; Straif, Kurt; Kromhout, Hans

    2017-01-01

    INTRODUCTION: Historic dust concentrations are available in a large-scale cohort study of workers in a chrysotile mine and processing factories in Asbest, Russian Federation. Parallel dust (gravimetric) and fibre (phase-contrast optical microscopy) concentrations collected in 1995, 2007 and 2013/14

  5. Co-development of Problem Gambling and Depression Symptoms in Emerging Adults: A Parallel-Process Latent Class Growth Model.

    Science.gov (United States)

    Edgerton, Jason D; Keough, Matthew T; Roberts, Lance W

    2018-02-21

    This study examines whether there are multiple joint trajectories of depression and problem gambling co-development in a sample of emerging adults. Data were from the Manitoba Longitudinal Study of Young Adults (n = 679), which was collected in 4 waves across 5 years (age 18-20 at baseline). Parallel process latent class growth modeling was used to identified 5 joint trajectory classes: low decreasing gambling, low increasing depression (81%); low stable gambling, moderate decreasing depression (9%); low stable gambling, high decreasing depression (5%); low stable gambling, moderate stable depression (3%); moderate stable problem gambling, no depression (2%). There was no evidence of reciprocal growth in problem gambling and depression in any of the joint classes. Multinomial logistic regression analyses of baseline risk and protective factors found that only neuroticism, escape-avoidance coping, and perceived level of family social support were significant predictors of joint trajectory class membership. Consistent with the pathways model framework, we observed that individuals in the problem gambling only class were more likely using gambling as a stable way to cope with negative emotions. Similarly, high levels of neuroticism and low levels of family support were associated with increased odds of being in a class with moderate to high levels of depressive symptoms (but low gambling problems). The results suggest that interventions for problem gambling and/or depression need to focus on promoting more adaptive coping skills among more "at-risk" young adults, and such interventions should be tailored in relation to specific subtypes of comorbid mental illness.

  6. Does the extended parallel process model fear appeal theory explain fears and barriers to prenatal physical activity?

    Science.gov (United States)

    Redmond, Michelle L; Dong, Fanglong; Frazier, Linda M

    2015-01-01

    Few studies have looked at the impact of fear on exercise behavior during pregnancy using a fear appeal theory. It is beneficial to understand how women receive the message of safe exercise during pregnancy and whether established guidelines have any influence on their decision to exercise. Using the extended parallel process model (EPPM), we explored women's fears about prenatal physical activity. We conducted a prospective, cross-sectional study on the fears and barriers to prenatal exercise among a racially/ethnically diverse population of pregnant women. Participants were recruited from local prenatal clinics. Ninety females with a singleton pregnancy between 16 and 30 weeks gestation were enrolled in the study. The primary outcome measure was classification of risk behavior based on the EPPM theory. Women who scored high on self-efficacy for exercising safely were more likely to exercise during pregnancy (adjusted odds ratio, 5.95; 95% CI, 1.39-25.39; P=.016) for at least 90 minutes per week. Participants who exercised at least 90 minutes per week during pregnancy scored higher on their perceived ability to control danger to the baby, as well as less susceptibility of harm and threat to baby of moderate exercise from prenatal exercise. More education and counseling on specific guidelines for safely exercising during pregnancy are needed. The EPPM framework has the potential to help improve health communications about exercise safety and guidelines between patients and health care professionals during pregnancy. Copyright © 2015 Jacobs Institute of Women's Health. Published by Elsevier Inc. All rights reserved.

  7. SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

    Directory of Open Access Journals (Sweden)

    M. K. Bouza

    2017-01-01

    Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.

  8. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    Energy Technology Data Exchange (ETDEWEB)

    Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2015-01-20

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 232 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.

  9. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  10. Development of imaging and reconstructions algorithms on parallel processing architectures for applications in non-destructive testing

    International Nuclear Information System (INIS)

    Pedron, Antoine

    2013-01-01

    This thesis work is placed between the scientific domain of ultrasound non-destructive testing and algorithm-architecture adequation. Ultrasound non-destructive testing includes a group of analysis techniques used in science and industry to evaluate the properties of a material, component, or system without causing damage. In order to characterise possible defects, determining their position, size and shape, imaging and reconstruction tools have been developed at CEA-LIST, within the CIVA software platform. Evolution of acquisition sensors implies a continuous growth of datasets and consequently more and more computing power is needed to maintain interactive reconstructions. General purpose processors (GPP) evolving towards parallelism and emerging architectures such as GPU allow large acceleration possibilities than can be applied to these algorithms. The main goal of the thesis is to evaluate the acceleration than can be obtained for two reconstruction algorithms on these architectures. These two algorithms differ in their parallelization scheme. The first one can be properly parallelized on GPP whereas on GPU, an intensive use of atomic instructions is required. Within the second algorithm, parallelism is easier to express, but loop ordering on GPP, as well as thread scheduling and a good use of shared memory on GPU are necessary in order to obtain efficient results. Different API or libraries, such as OpenMP, CUDA and OpenCL are evaluated through chosen benchmarks. An integration of both algorithms in the CIVA software platform is proposed and different issues related to code maintenance and durability are discussed. (author) [fr

  11. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  12. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  13. A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

    International Nuclear Information System (INIS)

    Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

    2011-01-01

    In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)

  14. SPINning parallel systems software

    International Nuclear Information System (INIS)

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-01-01

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

  15. Organization of the channel-switching process in parallel computer systems based on a matrix optical switch

    Science.gov (United States)

    Golomidov, Y. V.; Li, S. K.; Popov, S. A.; Smolov, V. B.

    1986-01-01

    After a classification and analysis of electronic and optoelectronic switching devices, the design principles and structure of a matrix optical switch is described. The switching and pair-exclusion operations in this type of switch are examined, and a method for the optical switching of communication channels is elaborated. Finally, attention is given to the structural organization of a parallel computer system with a matrix optical switch.

  16. Simulation of synchrotron white-beam topographs. An algorithm for parallel processing: application to the study of piezoelectric devices

    International Nuclear Information System (INIS)

    Epelboin, Y.

    1996-01-01

    This paper presents a new algorithm for the integration of Takagi-Taupin equations taking into account the fact that X-ray diffraction is a parallel phenomenon. The diffraction equations show that the propagation of the waves is independent in each incidence plane. It is thus possible to compute in parallel the propagation of the waves in different planes. Two algorithms are presented: the first one for multiprocessor machines where the processors share a common memory, the second one for massively parallel computers. The program is written to achieve a high vectorization ratio and to make it as efficient as possible with modern superscalar and array processors. The simulation of the image of a defect has been divided into two independent parts. In the first one, one computes the derivatives of the deformation inside the crystal; in the second one, these results are used to simulate the image. This allows one rapidly to change the model for a defect, something that was not feasible in all previously written simulation programs since the computation of the deformation was part of the simulation. The study of stroboscopic images of the propagation of acoustic waves in piezoelectric devices is given as an example of the possibilities of this new program. (orig.)

  17. A parallel process model of the development of positive smoking expectancies and smoking behavior during early adolescence in Caucasian and African American girls

    OpenAIRE

    Chung, Tammy; White, Helene R.; Hipwell, Alison E.; Stepp, Stephanie D.; Loeber, Rolf

    2010-01-01

    This study examined the development of positive smoking expectancies and smoking behavior in an urban cohort of girls followed annually over ages 11-14. Longitudinal data from the oldest cohort of the Pittsburgh Girls Study (N=566, 56% African American, 44% Caucasian) were used to estimate a parallel process growth model of positive smoking expectancies and smoking behavior. Average level of positive smoking expectancies was relatively stable over ages 11-14, although there was significant va...

  18. Development of mathematical model and optimal control system of internal temperatures of hot-blast stove process in staggered parallel operation; Netsufuro sushiki model to parallel sofu ni okeru ronai ondo saiteki seigyo system no kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    Matoba, Y. [Sumitomo Metal Industries, Ltd., Osaka (Japan); Otsuka, K.

    1998-07-01

    A mathematical model and an optimal control system of hot-blast stove process are described. A precise mathematical simulation model of the hot-blast stove was developed and the accuracy of the model has been confirmed. An optimal control system of the thermal conditions of the hot-blast stoves in staggered parallel operation was also developed. By the use of the multivariable optimal regulator and the feedforward compensations for the change of the aimed blast temperature and blast volume, the system is able to control the hot blast temperature and the brick temperature efficiently. The system has been applied to Kashima works. The variations of the blast temperature and the silica brick temperature have been decreased. The ultimate low heat level operations have been realized and the thermal efficiency furthermore has been raised by about 1%. 8 refs., 14 figs., 1 tab.

  19. Attachment of lead wires to thin film thermocouples mounted on high temperature materials using the parallel gap welding process

    Science.gov (United States)

    Holanda, Raymond; Kim, Walter S.; Pencil, Eric; Groth, Mary; Danzey, Gerald A.

    1990-01-01

    Parallel gap resistance welding was used to attach lead wires to sputtered thin film sensors. Ranges of optimum welding parameters to produce an acceptable weld were determined. The thin film sensors were Pt13Rh/Pt thermocouples; they were mounted on substrates of MCrAlY-coated superalloys, aluminum oxide, silicon carbide and silicon nitride. The entire sensor system is designed to be used on aircraft engine parts. These sensor systems, including the thin-film-to-lead-wire connectors, were tested to 1000 C.

  20. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  1. Toward a parallel and cascading model of the writing system: A review of research on writing processes coordination

    OpenAIRE

    Thierry Olive

    2014-01-01

    Efficient coordination of the different writing processes is central to producing good-quality texts, and is a fundamental component of writing skill. In this article, I propose a general theoretical framework for considering how writing processes are coordinated, in which writing processes are concurrently activated with more or less overlap between processes depending on their working memory demands, and with the flow of information cascading from central to peripheral levels of processing....

  2. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  3. FY1995 study of low power LSI design automation software with parallel processing; 1995 nendo heiretsu shori wo katsuyoshita shodenryoku LSI muke sekkei jidoka software no kenkyu kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    The needs for low power LSIs have rapidly increased recently. For the low power LSI development, not only new circuit technologies but also new design automation tools supporting the new technologies are indispensable. The purpose of this project is to develop a new design automation software, which is able to design new digital LSIs with much lower power than that of conventional CMOS LSIs. A new design automation software for very low power LSIs has been developed targeting the pass-transistor logic SPL, a dedicated low power circuit technology. The software includes a logic synthesis function for pass-transistor-based macrocells and a macrocell placement function. Several new algorithms have been developed for the software, e.g. BDD construction. Some of them are designed and implemented for parallel processing in order to reduce the processing time. The logic synthesis function was tested on a set of benchmarks and finally applied to a low power CPU design. The designed 8-bit CPU was fully compatible with Zilog Z-80. The power dissipation of the CPU was compared with that of commercial CMOS Z-80. At most 82% of power of CMOS was reduced by the new CPU. On the other hand, parallel processing speed up was measured on the macrocell placement function. 34 folds speed up was realized. (NEDO)

  4. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  5. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  6. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  7. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  8. Development of a parallel processing couple for calculations of control rod worth in terms of burn-up in a WWER-1000 reactor

    Energy Technology Data Exchange (ETDEWEB)

    Noori-Kalkhoran, Omid; Ahangari, R. [Nuclear Science and Technology Research Institute (NSTRI), Tehran (Iran, Islamic Republic of). Reactor Research school; Shirani, A.S. [Shahid Beheshti Univ., Tehran (Iran, Islamic Republic of). Faculty of Engineering

    2017-03-15

    In this study a code based method has been developed for calculation of integral and differential control rod worth in terms of burn-up for a WWER-1000 reactor. Parallel processing of WIMSD-5B, PARCS V2.7 and COBRA-EN has been used for this purpose. WIMSD-5B has been used for cell calculation and handling burn-up of core at different days. PARCS V2.7?has been used for neutronic calculation of core and critical boron concentration search. Thermal-hydraulic calculation has been performed by COBRA-EN. A Parallel processing algorithm has been developed by MATLAB to couple and transfer suitable data between these codes in each step. Steady-State Power Picking Factors (PPFs) of the core and Control rod worth have been calculated from Beginning Of Cycle (BOC) to 289.7 Effective full Power Days (EFPDs) in some steps. Results have been compared with Bushehr Nuclear Power Plant (BNPP) Final Safety Analysis Report (FSAR) results. The results show great similarity and confirm the ability of developed coupling in calculation of control rod worth in terms of burn-up.

  9. On the interplay between working memory consolidation and attentional selection in controlling conscious access : Parallel processing at a cost-a comment on 'The interplay of attention and consciousness in visual search, attentional blink and working memory consolidation'

    NARCIS (Netherlands)

    Wyble, Brad; Bowman, Howard; Nieuwenstein, Mark

    On the interplay between working memory consolidation and attentional selection in controlling conscious access: parallel processing at a cost-a comment on 'The interplay of attention and consciousness in visual search, attentional blink and working memory consolidation'

  10. Ultrascalable petaflop parallel supercomputer

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  11. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  12. Demonstration of Parallel Algal Processing: Production of Renewable Diesel Blendstock and a High-Value Chemical Intermediate

    Energy Technology Data Exchange (ETDEWEB)

    Knoshaug, Eric P [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Mohagheghi, Ali [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Nagle, Nicholas J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Stickel, Jonathan J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Dong, Tao [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Karp, Eric M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Kruger, Jacob S [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Brandner, David [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Manker, Lorenz [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Rorrer, Nicholas [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Hyman, Deborah A [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Christensen, Earl D [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Pienkos, Philip T [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2017-12-19

    Co-production of high-value chemicals such as succinic acid from algal sugars is a promising route to enabling conversion of algal lipids to a renewable diesel blendstock. Biomass from the green alga Scenedesmus acutus was acid pretreated and the resulting slurry separated into its solid and liquor components using charged polyamide induced flocculation and vacuum filtration. Over the course of a subsequent 756 hours continuous fermentation of the algal liquor with Actinobacillus succinogenes 130Z, we achieved maximum productivity, process conversion yield, and titer of 1.1 g L-1 h-1, 0.7 g g-1 total sugars, and 30.5 g L-1 respectively. Succinic acid was recovered from fermentation media with a yield of 60% at 98.4% purity while lipids were recovered from the flocculated cake at 83% yield with subsequent conversion through deoxygenation and hydroisomerization to a renewable diesel blendstock. This work is a first-of-its-kind demonstration of a novel integrated conversion process for algal biomass to produce fuel and chemical products of sufficient quality to be blend-ready feedstocks for further processing.

  13. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  14. Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

    International Nuclear Information System (INIS)

    Badal, Andreu; Badano, Aldo

    2009-01-01

    Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.

  15. Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

    Energy Technology Data Exchange (ETDEWEB)

    Badal, Andreu; Badano, Aldo [Division of Imaging and Applied Mathematics, OSEL, CDRH, U.S. Food and Drug Administration, Silver Spring, Maryland 20993-0002 (United States)

    2009-11-15

    Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.

  16. Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

    Science.gov (United States)

    Badal, Andreu; Badano, Aldo

    2009-11-01

    It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.

  17. The effects of fear appeal message repetition on perceived threat, perceived efficacy, and behavioral intention in the extended parallel process model.

    Science.gov (United States)

    Shi, Jingyuan Jolie; Smith, Sandi W

    2016-01-01

    This study examined the effect of moderately repeated exposure (three times) to a fear appeal message on the Extended Parallel Processing Model (EPPM) variables of threat, efficacy, and behavioral intentions for the recommended behaviors in the message, as well as the proportions of systematic and message-related thoughts generated after each message exposure. The results showed that after repeated exposure to a fear appeal message about preventing melanoma, perceived threat in terms of susceptibility and perceived efficacy in terms of response efficacy significantly increased. The behavioral intentions of all recommended behaviors did not change after repeated exposure to the message. However, after the second exposure the proportions of both systematic and all message-related thoughts (relative to total thoughts) significantly decreased while the proportion of heuristic thoughts significantly increased, and this pattern held after the third exposure. The findings demonstrated that the predictions in the EPPM are likely to be operative after three exposures to a persuasive message.

  18. Using the Extended Parallel Process Model to create and evaluate the effectiveness of brochures to reduce the risk for noise-induced hearing loss in college students

    Directory of Open Access Journals (Sweden)

    Michael R Kotowski

    2011-01-01

    Full Text Available Brochures containing messages developed according to the Extended Parallel Process Model were deployed to increase intentions to use hearing protection for college students. These brochures were presented to one-half of a college student sample, after which a questionnaire was administered to assess perceptions of threat, efficacy, and behavioral intentions. The other half of the sample completed the questionnaire and then received brochures. Results indicated that people receiving the brochure before the questionnaire reported greater perceptions of hearing loss threat and efficacy to use ear plugs when in loud environments, however, intentions to use ear plugs were unchanged. Distribution of the brochure also resulted in greater perceptions of hearing loss threat and efficacy to use over-the-ear headphones when using devices such as MP3 players. In this case, however, intentions to use over-the-ear headphones increased. Results are discussed in terms of future research and practical applications.

  19. Differential cognitive processing of Kanji and Kana words: do orthographic and semantic codes function in parallel in word matching task.

    Science.gov (United States)

    Kawakami, A; Hatta, T; Kogure, T

    2001-12-01

    Relative engagements of the orthographic and semantic codes in Kanji and Hiragana word recognition were investigated. In Exp. 1, subjects judged whether the pairs of Kanji words (prime and target) presented sequentially were physically identical to each other in the word condition. In the sentence condition, subjects decided whether the target word was valid for the prime sentence presented in advance. The results showed that the response times to the target swords orthographically similar (to the prime) were significantly slower than to semantically related target words in the word condition and that this was also the case in the sentence condition. In Exp. 2, subjects judged whether the target word written in Hiragana was physically identical to the prime word in the word condition. In the sentence condition, subjects decided if the target word was valid for the previously presented prime sentence. Analysis indicated that response times to orthographically similar words were slower than to semantically related words in the word condition but not in the sentence condition wherein the response times to the semantically and orthographically similar words were largely the same. Based on these results, differential contributions of orthographic and semantic codes in cognitive processing of Japanese Kanji and Hiragana words was discussed.

  20. Real-time parallel processing of grammatical structure in the fronto-striatal system: a recurrent network simulation study using reservoir computing.

    Science.gov (United States)

    Hinaut, Xavier; Dominey, Peter Ford

    2013-01-01

    Sentence processing takes place in real-time. Previous words in the sentence can influence the processing of the current word in the timescale of hundreds of milliseconds. Recent neurophysiological studies in humans suggest that the fronto-striatal system (frontal cortex, and striatum--the major input locus of the basal ganglia) plays a crucial role in this process. The current research provides a possible explanation of how certain aspects of this real-time processing can occur, based on the dynamics of recurrent cortical networks, and plasticity in the cortico-striatal system. We simulate prefrontal area BA47 as a recurrent network that receives on-line input about word categories during sentence processing, with plastic connections between cortex and striatum. We exploit the homology between the cortico-striatal system and reservoir computing, where recurrent frontal cortical networks are the reservoir, and plastic cortico-striatal synapses are the readout. The system is trained on sentence-meaning pairs, where meaning is coded as activation in the striatum corresponding to the roles that different nouns and verbs play in the sentences. The model learns an extended set of grammatical constructions, and demonstrates the ability to generalize to novel constructions. It demonstrates how early in the sentence, a parallel set of predictions are made concerning the meaning, which are then confirmed or updated as the processing of the input sentence proceeds. It demonstrates how on-line responses to words are influenced by previous words in the sentence, and by previous sentences in the discourse, providing new insight into the neurophysiology of the P600 ERP scalp response to grammatical complexity. This demonstrates that a recurrent neural network can decode grammatical structure from sentences in real-time in order to generate a predictive representation of the meaning of the sentences. This can provide insight into the underlying mechanisms of human cortico

  1. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, Michael [Iowa State Univ., Ames, IA (United States)

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  2. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  3. Parallel processing in nuclear applications

    International Nuclear Information System (INIS)

    Muniz, Francisco Junqueira

    1995-01-01

    This paper summarizes some investigations on effective and scalable dynamic load-balancing mechanisms suitable for distributed-memory (loosely-coupled) MIMD systems. The selected implementation environment is composed of T800 transputers programed in the occam and C languages and an automatic routing package communication software mechanism (the virtual channel router). Tasks were generated, at execution time, using a multiple-spawning mechanism based on a set of remote procedure calls primitives. The objective is to improve maximum resource utilization. In particular, the investigation described here facilitate portability of the user application, since it concentrates on system-level load balancing mechanisms. The load-balancing mechanisms studies are also suitable for systems that can vary in size, concentrating on methods with potential for scalability. Two possible application examples, chosen from the nuclear area, where distributed-memory MIMD machines can be utilized, are mentioned. (author). 24 refs., 1 fig

  4. Removal of antibiotics in a parallel-plate thin-film-photocatalytic reactor: Process modeling and evolution of transformation by-products and toxicity.

    Science.gov (United States)

    Özkal, Can Burak; Frontistis, Zacharias; Antonopoulou, Maria; Konstantinou, Ioannis; Mantzavinos, Dionissios; Meriç, Süreyya

    2017-10-01

    Photocatalytic degradation of sulfamethoxazole (SMX) antibiotic has been studied under recycling batch and homogeneous flow conditions in a thin-film coated immobilized system namely parallel-plate (PPL) reactor. Experimentally designed, statistically evaluated with a factorial design (FD) approach with intent to provide a mathematical model takes into account the parameters influencing process performance. Initial antibiotic concentration, UV energy level, irradiated surface area, water matrix (ultrapure and secondary treated wastewater) and time, were defined as model parameters. A full of 2 5 experimental design was consisted of 32 random experiments. PPL reactor test experiments were carried out in order to set boundary levels for hydraulic, volumetric and defined defined process parameters. TTIP based thin-film with polyethylene glycol+TiO 2 additives were fabricated according to pre-described methodology. Antibiotic degradation was monitored by High Performance Liquid Chromatography analysis while the degradation products were specified by LC-TOF-MS analysis. Acute toxicity of untreated and treated SMX solutions was tested by standard Daphnia magna method. Based on the obtained mathematical model, the response of the immobilized PC system is described with a polynomial equation. The statistically significant positive effects are initial SMX concentration, process time and the combined effect of both, while combined effect of water matrix and irradiated surface area displays an adverse effect on the rate of antibiotic degradation by photocatalytic oxidation. Process efficiency and the validity of the acquired mathematical model was also verified for levofloxacin and cefaclor antibiotics. Immobilized PC degradation in PPL reactor configuration was found capable of providing reduced effluent toxicity by simultaneous degradation of SMX parent compound and TBPs. Copyright © 2017. Published by Elsevier B.V.

  5. Optimization of the parameter calculation the process of production historic by using Parallel Virtual Machine-PVM; Otimizacao do calculo de parametros no processo de ajuste de historicos de producao usando PVM

    Energy Technology Data Exchange (ETDEWEB)

    Vargas Cuervo, Carlos Hernan

    1997-03-01

    The main objective of this work is to develop a methodology to optimize the simultaneous computation of two parameters in the process of production history matching. This work describes a procedure to minimize an objective function established to find the values of the parameters which are modified in the process. The parameters are chosen after a sensibility analysis. Two optimization methods are tested: a Region Search Method (MBR) and Polytope Method. Both are based in direct search methods which do not require the function derivative. The software PVM (Parallel Virtual Machine) is used to parallelize the simulation runs, allowing the acceleration of the process and the search of multiple solutions. The validation of the methodology is applied to two reservoir models: one homogeneous and other heterogeneous. The advantages of each method and of the parallelization are also present. (author)

  6. A further extension of the Extended Parallel Process Model (E-EPPM): implications of cognitive appraisal theory of emotion and dispositional coping style.

    Science.gov (United States)

    So, Jiyeon

    2013-01-01

    For two decades, the extended parallel process model (EPPM; Witte, 1992 ) has been one of the most widely used theoretical frameworks in health risk communication. The model has gained much popularity because it recognizes that, ironically, preceding fear appeal models do not incorporate the concept of fear as a legitimate and central part of them. As a remedy to this situation, the EPPM aims at "putting the fear back into fear appeals" ( Witte, 1992 , p. 330). Despite this attempt, however, this article argues that the EPPM still does not fully capture the essence of fear as an emotion. Specifically, drawing upon Lazarus's (1991 ) cognitive appraisal theory of emotion and the concept of dispositional coping style ( Miller, 1995 ), this article seeks to further extend the EPPM. The revised EPPM incorporates a more comprehensive perspective on risk perceptions as a construct involving both cognitive and affective aspects (i.e., fear and anxiety) and integrates the concept of monitoring and blunting coping style as a moderator of further information seeking regarding a given risk topic.

  7. The Moderated Mediating Effect of Self-Efficacy on Exercise Among Older Adults in an Online Bone Health Intervention Study: A Parallel Process Latent Growth Curve Model.

    Science.gov (United States)

    Zhu, Shijun; Nahm, Eun-Shim; Resnick, Barbara; Friedmann, Erika; Brown, Clayton; Park, Jumin; Cheon, Jooyoung; Park, DoHwan

    2017-07-01

    This secondary data analyses of a longitudinal study assessed whether self-efficacy for exercise (SEE) mediated online intervention effects on exercise among older adults and whether age (50-64 vs. ≥65 years) moderated the mediation. Data were from an online bone health intervention study. Eight hundred sixty-six older adults (≥50 years) were randomized to three arms: Bone Power (n = 301), Bone Power Plus (n = 302), or Control (n = 263). Parallel process latent growth curve modeling (LGCM) was used to jointly model growths in SEE and in exercise and to assess the mediating effect of SEE on the effect of intervention on exercise. SEE was a significant mediator in 50- to 64-year-old adults (0.061, 95 BCI: 0.011, 0.163) but not in the ≥65 age group (-0.004, 95% BCI: -0.047, 0.025). Promotion of SEE is critical to improve exercise among 50- to 64-year-olds.

  8. A fast and efficient adaptive parallel ray tracing based model for thermally coupled surface radiation in casting and heat treatment processes

    International Nuclear Information System (INIS)

    Fainberg, J; Schaefer, W

    2015-01-01

    A new algorithm for heat exchange between thermally coupled diffusely radiating interfaces is presented, which can be applied for closed and half open transparent radiating cavities. Interfaces between opaque and transparent materials are automatically detected and subdivided into elementary radiation surfaces named tiles. Contrary to the classical view factor method, the fixed unit sphere area subdivision oriented along the normal tile direction is projected onto the surrounding radiation mesh and not vice versa. Then, the total incident radiating flux of the receiver is approximated as a direct sum of radiation intensities of representative “senders” with the same weight factor. A hierarchical scheme for the space angle subdivision is selected in order to minimize the total memory and the computational demands during thermal calculations. Direct visibility is tested by means of a voxel-based ray tracing method accelerated by means of the anisotropic Chebyshev distance method, which reuses the computational grid as a Chebyshev one. The ray tracing algorithm is fully parallelized using MPI and takes advantage of the balanced distribution of all available tiles among all CPU's. This approach allows tracing of each particular ray without any communication. The algorithm has been implemented in a commercial casting process simulation software. The accuracy and computational performance of the new radiation model for heat treatment, investment and ingot casting applications is illustrated using industrial examples. (paper)

  9. Encouraging early preventive dental visits for preschool-aged children enrolled in Medicaid: using the extended parallel process model to conduct formative research.

    Science.gov (United States)

    Askelson, Natoshia M; Chi, Donald L; Momany, Elizabeth; Kuthy, Raymond; Ortiz, Cristina; Hanson, Jessica D; Damiano, Peter

    2014-01-01

    Preventive dental visits for preschool-aged children can result in better oral health outcomes, especially for children from lower income families. Many children, however, still do not see a dentist for preventive visits. This qualitative study examined the potential for the Extended Parallel Process Model (EPPM) to be used to uncover potential antecedents to parents' decisions about seeking preventive dental care. Seventeen focus groups including 41 parents were conducted. The focus group protocol centered on constructs (perceived severity, perceived susceptibility, perceived self-efficacy, and perceived response efficacy) of the EPPM. Transcripts were analyzed by three coders who employed closed coding strategies. Parents' perceptions of severity of dental issues were high, particularly regarding negative health and appearance outcomes. Parents perceived susceptibility of their children to dental problems as low, primarily because most children in this study received preventive care, which parents viewed as highly efficacious. Parents' self-efficacy to obtain preventive care for their children was high. However, they were concerned about barriers including lack of dentists, especially dentists who are good with young children. Findings were consistent with EPPM, which suggests this model is a potential tool for understanding parents' decisions about seeking preventive dental care for their young children. Future research should utilize quantitative methods to test this model. © 2012 American Association of Public Health Dentistry.

  10. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  11. Evaluating parallel optimization on transputers

    Directory of Open Access Journals (Sweden)

    A.G. Chalmers

    2003-12-01

    Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.

  12. Overview of the Force Scientific Parallel Language

    Directory of Open Access Journals (Sweden)

    Gita Alaghband

    1994-01-01

    Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.

  13. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.

  14. Longitudinal associations between sleep and anxiety during pregnancy, and the moderating effect of resilience, using parallel process latent growth curve models.

    Science.gov (United States)

    van der Zwan, Judith Esi; de Vente, Wieke; Tolvanen, Mimmi; Karlsson, Hasse; Buil, J Marieke; Koot, Hans M; Paavonen, E Juulia; Polo-Kantola, Päivi; Huizink, Anja C; Karlsson, Linnea

    2017-12-01

    For many women, pregnancy-related sleep disturbances and pregnancy-related anxiety change as pregnancy progresses and both are associated with lower maternal quality of life and less favorable birth outcomes. Thus, the interplay between these two problems across pregnancy is of interest. In addition, psychological resilience may explain individual differences in this association, as it may promote coping with both sleep disturbances and anxiety, and thereby reduce their mutual effects. Therefore, the aim of the current study was to examine whether sleep quality and sleep duration, and changes in sleep are associated with the level of and changes in anxiety during pregnancy. Furthermore, the study tested the moderating effect of resilience on these associations. At gestational weeks 14, 24, and 34, 532 pregnant women from the FinnBrain Birth Cohort Study in Finland filled out questionnaires on general sleep quality, sleep duration and pregnancy-related anxiety; resilience was assessed in week 14. Parallel process latent growth curve models showed that shorter initial sleep duration predicted a higher initial level of anxiety, and a higher initial anxiety level predicted a faster shortening of sleep duration. Changes in sleep duration and changes in anxiety over the course of pregnancy were not related. The predicted moderating effect of resilience was not found. The results suggested that pregnant women reporting anxiety problems should also be screened for sleeping problems, and vice versa, because women who experienced one of these pregnancy-related problems were also at risk of experiencing or developing the other problem. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Self-critical perfectionism, dependency, and symptomatic distress in patients with personality disorder during hospitalization-based psychodynamic treatment: A parallel process growth modeling approach.

    Science.gov (United States)

    Lowyck, Benedicte; Luyten, Patrick; Vermote, Rudi; Verhaest, Yannic; Vansteelandt, Kristof

    2017-07-01

    There is growing evidence for the efficacy and effectiveness of psychotherapy in patients with personality disorder (PD), but very little is known about the factors underlying these effects. Two-polarities models of personality development provide an empirically supported approach to studying therapeutic change. Briefly, these models argue that personality pathology is characterized by an imbalance between development of the capacity for self-definition and for relatedness, with an exaggerated emphasis on issues regarding self-definition and relatedness being expressed in high levels of self-critical perfectionism (SCP) and dependency, respectively. This study used data from a study of 111 patients with PD who received long-term hospitalization-based psychodynamic treatment to investigate whether (a) treatment was related to changes in SCP, dependency, and symptomatic distress; (b) these changes could be explained by pretreatment levels of SCP, dependency, and/or symptomatic distress; and (c) changes in these personality dimensions over time were associated with symptomatic improvement. SCP, dependency, and symptomatic distress were assessed at admission (baseline), at 12 and 24 weeks into treatment, and at discharge. Parallel process multilevel growth modeling showed that (a) treatment was associated with a significant decrease in levels of SCP, dependency, and symptomatic distress, whereas (b) pretreatment levels of each of these three factors did not predict the decreases observed, and (c) changes in SCP, but not dependency, were associated with the rate of decrease in symptomatic distress over time. Implications of these findings for our understanding of therapeutic change in the treatment of PD are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. Study on convective mixing for thermal striping phenomena. Thermal-hydraulic analyses on mixing process in parallel triple-jet and comparisons between numerical methods

    International Nuclear Information System (INIS)

    Kimura, Nobuyuki; Nishimura, Motohiko; Kamide, Hideki

    2000-03-01

    A quantitative evaluation on thermal striping, in which temperature fluctuation due to convective mixing among jets imposes thermal fatigue on structural components, is of importance for reactor safety. In the present study, a water experiment was performed on parallel triple-jet: cold jet at the center and hot jets in both sides. Three kinds of numerical analyses based on the finite difference method were carried out to compare the similarity with the experiment by use of respective different handling of turbulence such as a k-ε two equation turbulence model (k-ε Model), a low Reynolds number stress and heat flux equation model (LRSFM) and a direct numerical simulation (DNS). In the experiment, the jets were mainly mixed due to the coherent oscillation. The numerical result using k-ε Model could not reproduce the coherent oscillating motion of jets due to rolling-up fluid. The oscillations of the jets predicted by LRSFM and DNS were in good agreements with the experiment. The comparison between the coherent and random components in experimental temperature fluctuation obtained by using the phase-averaging shows that k-ε Model and LRSFM overestimated the random component and the coherent component respectively. The ratios of coherent to random components in total temperature fluctuation obtained from DNS were in good agreements with the experiment. The numerical analysis using DNS can reproduce the coherent oscillation of the jets and the coherent / random components in temperature fluctuation. The analysis using LRSFM could simulate the mixing process of the jets with the low frequency. (author)

  17. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

    2014-01-01

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  18. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack

    2014-02-04

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  19. Fast parallel event reconstruction

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  20. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  1. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  2. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  3. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  4. Computer-Aided Parallelizer and Optimizer

    Science.gov (United States)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  5. Cellular automata a parallel model

    CERN Document Server

    Mazoyer, J

    1999-01-01

    Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.

  6. Synchronization Techniques in Parallel Discrete Event Simulation

    OpenAIRE

    Lindén, Jonatan

    2018-01-01

    Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...

  7. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  8. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  9. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  10. Parallel imaging microfluidic cytometer.

    Science.gov (United States)

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  12. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  13. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  14. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  15. High performance parallel I/O

    CERN Document Server

    Prabhat

    2014-01-01

    Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har

  16. Customizable Memory Schemes for Data Parallel Architectures

    NARCIS (Netherlands)

    Gou, C.

    2011-01-01

    Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses

  17. Parallel fuzzy connected image segmentation on GPU

    OpenAIRE

    Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

    2011-01-01

    Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...

  18. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  19. Parallel discrete event simulation

    NARCIS (Netherlands)

    Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

    1991-01-01

    In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation

  20. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  1. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  2. Utilization of Mental Health Services and Mental Health Status Among Children Placed in Out-of-Home Care: A Parallel Process Latent Growth Modeling Approach.

    Science.gov (United States)

    Yampolskaya, Svetlana; Sharrock, Patty J; Clark, Colleen; Hanson, Ardis

    2017-10-01

    This longitudinal study examined the parallel trajectories of mental health service use and mental health status among children placed in Florida out-of-home care. The results of growth curve modeling suggested that children with greater mental health problems initially received more mental health services. Initial child mental health status, however, had no effect on subsequent service provision when all outpatient mental health services were included. When specific types of mental health services, such as basic outpatient, targeted case management, and intensive mental health services were examined, results suggested that children with compromised functioning during the baseline period received more intensive mental health services over time. However, this increased provision of intensive mental health services did not improve mental health status, rather it was significantly associated with progressively worse mental health functioning. These findings underscore the need for regular comprehensive mental health assessments focusing on specific needs of the child.

  3. Modern industrial simulation tools: Kernel-level integration of high performance parallel processing, object-oriented numerics, and adaptive finite element analysis. Final report, July 16, 1993--September 30, 1997

    Energy Technology Data Exchange (ETDEWEB)

    Deb, M.K.; Kennon, S.R.

    1998-04-01

    A cooperative R&D effort between industry and the US government, this project, under the HPPP (High Performance Parallel Processing) initiative of the Dept. of Energy, started the investigations into parallel object-oriented (OO) numerics. The basic goal was to research and utilize the emerging technologies to create a physics-independent computational kernel for applications using adaptive finite element method. The industrial team included Computational Mechanics Co., Inc. (COMCO) of Austin, TX (as the primary contractor), Scientific Computing Associates, Inc. (SCA) of New Haven, CT, Texaco and CONVEX. Sandia National Laboratory (Albq., NM) was the technology partner from the government side. COMCO had the responsibility of the main kernel design and development, SCA had the lead in parallel solver technology and guidance on OO technologies was Sandia`s main expertise in this venture. CONVEX and Texaco supported the partnership by hardware resource and application knowledge, respectively. As such, a minimum of fifty-percent cost-sharing was provided by the industry partnership during this project. This report describes the R&D activities and provides some details about the prototype kernel and example applications.

  4. Operability probabilistic analysis: methodology for economic improvement through the parallelization of process plants; Analisis probabilistico de operatividad: metodologia para mejora economica a traves de la paralelizacion de plantas de proceso

    Energy Technology Data Exchange (ETDEWEB)

    Mendoza, A.; Francois, J. L.; Martin del Campo, C.; Nelson, P. F., E-mail: iqalexmdz@yahoo.com.mx [UNAM, Facultad de Ingenieria, Departamento de Sistemas Energeticos, Paseo Cuauhnahuac No. 8532, Col. Progreso, 62550 Jiutepec, Morelos (Mexico)

    2012-10-15

    One of the major challenges of the emergent technologies to overcome is the economic competitive with regard to the established technologies ar the present time, since these should not only take advantage efficiently of the energy resources and the raw materials in their productive processes, but also to elevate to the maximum the use of the derived economic resources of the initial investment of the plant. In special cases, like in those related with the electric power generation or fuels, the fixed cost represents a high percentage of the total cost, where is observed a great dependence with the plant factor, parameter that in turn is susceptible to non prospective variations but yes predictable by means of the use of analytic tools, able to relate the failures rates of present elements in the plant with the probability of operation outside times, as the Operability Probabilistic Analysis. In this study were evaluated the implications of changes in the plant configurations, with the purpose of knowing the economic advantages of a major or minor equipment s division in parallel (parallelization); the function general objective is established to evaluate the parallelization alternatives and the basic concepts are presented to carry out this methodology. At the end a study case is developed for a hydrogen production plant in its section of sulfuric acid decomposition. (Author)

  5. Parallel Monte Carlo simulation of aerosol dynamics

    KAUST Repository

    Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.

    2014-01-01

    is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI

  6. Parallel programming practical aspects, models and current limitations

    CERN Document Server

    Tarkov, Mikhail S

    2014-01-01

    Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...

  7. Synchronization Of Parallel Discrete Event Simulations

    Science.gov (United States)

    Steinman, Jeffrey S.

    1992-01-01

    Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

  8. Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

    Science.gov (United States)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.

  9. DUAL PARALLEL PROCESS IN CRISIS SITUATIONS: MOTIVATIONAL FOUNDATION/ EL DOBLE PROCESAMIENTO PARALELO EN SITUACIÓN DE CRISIS: FUNDAMENTACIÓN MOTIVACIONAL/ O DOBRO PROCESSAMENTO PARALELO EM SITUAÇÃO DE CRISE: FUNDAMENTAÇÃO MOTIVACIONAL

    Directory of Open Access Journals (Sweden)

    Carlos Gantiva

    2012-12-01

    Full Text Available The objective of this paper is to present a cognitive-behavioral model that makes it possible to explain the crisis situation (CS in terms of intense motivational involvement, and to propose a brief motivational intervention proposal in CS. The CS requires the person to implement coping strategies focused on the management of objective damage, as well as on the search for emotional relief, a consideration that gives rise to the name of dual parallel processing in CS (DPP-CS. Brief intervention is understood as the involvement of motivational processes to enable the person to make decisions regarding emotional and instrumental coping which move her in the direction of emotional relief or solution of the crisis. The paper concludes with a summary of the three basic sources taken from the psychological literature to inform the design of the DPP-CS: the dual extended parallel process model, the cognitive theory of stress and coping, and the formulation by levels in cognitive therapy.

  10. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    Science.gov (United States)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  11. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  12. Parallel R-matrix computation

    International Nuclear Information System (INIS)

    Heggarty, J.W.

    1999-06-01

    For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in

  13. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  14. Bayer image parallel decoding based on GPU

    Science.gov (United States)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  15. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  16. Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Mustafa Basthikodi

    2016-04-01

    Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.

  17. Portable parallel programming in a Fortran environment

    International Nuclear Information System (INIS)

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

  18. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    Science.gov (United States)

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  20. The kpx, a program analyzer for parallelization

    International Nuclear Information System (INIS)

    Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

    1997-03-01

    The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)

  1. Speedup predictions on large scientific parallel programs

    International Nuclear Information System (INIS)

    Williams, E.; Bobrowicz, F.

    1985-01-01

    How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory

  2. Language constructs for modular parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.

    1996-03-01

    We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.

  3. Distributed Parallel Architecture for "Big Data"

    Directory of Open Access Journals (Sweden)

    Catalin BOJA

    2012-01-01

    Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.

  4. Massively parallel evolutionary computation on GPGPUs

    CERN Document Server

    Tsutsui, Shigeyoshi

    2013-01-01

    Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u

  5. Acoustic simulation in architecture with parallel algorithm

    Science.gov (United States)

    Li, Xiaohong; Zhang, Xinrong; Li, Dan

    2004-03-01

    In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

  6. Parallel Fast Legendre Transform

    NARCIS (Netherlands)

    Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.

    1998-01-01

    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were

  7. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  8. Parallel universes beguile science

    CERN Multimedia

    2007-01-01

    A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.

  9. Parallel plate detectors

    International Nuclear Information System (INIS)

    Gardes, D.; Volkov, P.

    1981-01-01

    A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr

  10. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  11. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    Science.gov (United States)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  12. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...

  13. 'Research and development of research information infrastructure'. Achievement report on development of parallel processing software technology for discrete value solving methods; Kenkyu joho kiban kenkyu kaihatsu seika hokokusho. Risanka suchi kaiho no tame no heiretsu shori software gijutsu kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-09-01

    Research and development has been performed on a general purpose parallel processing software that can be utilized for value solving methods, such as the finite element method, finite volume method and finite difference method. The achievements of the research and development may be summarized as follows: this parallel platform is parallelized in the concept of the domain division method for the elements (calculation cells), and is applicable to any of the finite element method, finite volume method and finite difference method; a researcher who has developed a program can easily perform the parallelization work to have the parallelizing performance displayed; the platform can be utilized in agreement with several parallel levels that are required by the user; with regard to the parallelization efficiency in large-size problems, it has become possible to execute at an efficiency of higher than 70% for the solver parts by using 32 processors of SR8000 at the computation center of the Agency of Industrial Science and Technology; the rigidity matrix preparing part shows an efficiency close to 100%W; and the developed parallel platform is under continued evaluation at the Machine Technology Research Institute and the Material Engineering Research Institute. (NEDO)

  14. Parallel and vector implementation of APROS simulator code

    International Nuclear Information System (INIS)

    Niemi, J.; Tommiska, J.

    1990-01-01

    In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)

  15. Parallel Planar-Processed and Ion-Induced Electrically Isolated Future Generation AlGaN/GaN HEMT for Gas Sensing and Opto-Telecommunication Applications

    International Nuclear Information System (INIS)

    Ahmed, S; Bokhari, S H; Amin, F; Khan, L A; Hussain, Z

    2013-01-01

    Ion-implanted AlGaN/GaN High Electron Mobility Transistors (HEMT) devices were studied thoroughly to look into the possibilities of enhancing efficiency for high-power and high-frequency electronic and gas sensing applications. A dedicated experimental design was created in order to study the influence of the physical parameters in response to high energy (by virtue of in-situ beam heating due to highly energetic implantation) ion implantation to the active device regions in nitride HEMT structures. Disorder or damage created in the HEMT structure was then studied carefully with electrical characterization techniques such as Hall, I-V and G-V measurements. The evolution of the electrical characteristics affecting the high-power, high-frequency and ultra-high efficiency gas sensing operations were also analyzed by subjecting the HEMT active device regions to progressive time-temperature annealing cycles. Our suggested model can also provide a functional process engineering window to control the extent of 2D Electron mobility in AlGaN/GaN HEMT devices undergoing a full cycle of thermal impact (i.e. from a desirable conductive region to a highly compensated one)

  16. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  17. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  18. More parallel please

    DEFF Research Database (Denmark)

    Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert

    of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...

  19. PARALLEL MOVING MECHANICAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Florian Ion Tiberius Petrescu

    2014-09-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.

  20. Current distribution characteristics of superconducting parallel circuits

    International Nuclear Information System (INIS)

    Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.

    1994-01-01

    In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process

  1. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  2. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  3. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  4. Xyce parallel electronic simulator.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  5. Stability of parallel flows

    CERN Document Server

    Betchov, R

    2012-01-01

    Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation

  6. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  7. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  8. Seamless-merging-oriented parallel inverse lithography technology

    International Nuclear Information System (INIS)

    Yang Yiwei; Shi Zheng; Shen Shanhu

    2009-01-01

    Inverse lithography technology (ILT), a promising resolution enhancement technology (RET) used in next generations of IC manufacture, has the capability to push lithography to its limit. However, the existing methods of ILT are either time-consuming due to the large layout in a single process, or not accurate enough due to simply block merging in the parallel process. The seamless-merging-oriented parallel ILT method proposed in this paper is fast because of the parallel process; and most importantly, convergence enhancement penalty terms (CEPT) introduced in the parallel ILT optimization process take the environment into consideration as well as environmental change through target updating. This method increases the similarity of the overlapped area between guard-bands and work units, makes the merging process approach seamless and hence reduces hot-spots. The experimental results show that seamless-merging-oriented parallel ILT not only accelerates the optimization process, but also significantly improves the quality of ILT.

  9. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  10. Parallel processing Monte Carlo radiation transport codes

    International Nuclear Information System (INIS)

    McKinney, G.W.

    1994-01-01

    Issues related to distributed-memory multiprocessing as applied to Monte Carlo radiation transport are discussed. Measurements of communication overhead are presented for the radiation transport code MCNP which employs the communication software package PVM, and average efficiency curves are provided for a homogeneous virtual machine

  11. Developing Software to Use Parallel Processing Effectively

    Science.gov (United States)

    1988-10-01

    in which description of an algorithm is strictly separated from its implementation details. 3-22 EDDA is a dataflow design language with graphical...notation sim- ilar to SADT and a semantic structure resembling Petri-Nets. Unlike Petri-Nets, EDDA can represent recursion. An EDDA design contains...predefined functional elements and obeys strict semantic rules, and can therefore be compiled into executable code. Programs are built in EDDA by

  12. Stateful load balancing for parallel stream processing

    DEFF Research Database (Denmark)

    Guo, Qingsong; Zhou, Yongluan

    2018-01-01

    -objective optimization problem, namely Minimum-Cost-Load-Balance (MCLB). We address MCLB with two approximate algorithms by a certain relaxation of the objectives: (1) a greedy algorithm ELB performs load balancing eagerly but relaxes the objective of load imbalance to a range; and (2) a periodic algorithm CLB aims...

  13. Parallel Processing with TreeClust

    Science.gov (United States)

    2017-09-01

    Francisco: Elsevier Inc. Hartigan, J. (1975). Clustering algorithms. New York, NY: John Wiley and Sons. Horstmann, C. (2008). Big java (3rd ed...Computer organization and design (3rd ed.). Burlington, MA: Elsevier . 105 R Core Team. (2017, March). R: a language and environment for statistical

  14. Seismic processing using Parallel 3D FMM

    OpenAIRE

    Borlaug, Idar

    2007-01-01

    This thesis develops and tests 3D Fast Marching Method (FMM) algorithm and apply these to seismic simulations. The FMM is a general method for monotonically advancing fronts, originally developed by Sethian. It calculates the first arrival time for an advancing front or wave. FMM methods are used for a variety of applications including, fatigue cracks in materials, lymph node segmentation in CT images, computing skeletons and centerlines in 3D objects and for finding salt formations in seismi...

  15. Xyce parallel electronic simulator design.

    Energy Technology Data Exchange (ETDEWEB)

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  16. Parallel inter channel interaction mechanisms

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.

    1995-01-01

    Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)

  17. Abstract Level Parallelization of Finite Difference Methods

    Directory of Open Access Journals (Sweden)

    Edwin Vollebregt

    1997-01-01

    Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.

  18. Massively Parallel QCD

    International Nuclear Information System (INIS)

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-01-01

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results

  19. Eliminating graphs by means of parallel knock-out schemes

    NARCIS (Netherlands)

    Broersma, H.J.; Fomin, F.V.; Královic, R.; Woeginger, G.J.

    2007-01-01

    In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

  20. Eliminating graphs by means of parallel knock-out schemes

    NARCIS (Netherlands)

    Broersma, Haitze J.; Fomin, F.V.; Královič, R.; Woeginger, Gerhard

    In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

  1. Towards a streaming model for nested data parallelism

    DEFF Research Database (Denmark)

    Madsen, Frederik Meisner; Filinski, Andrzej

    2013-01-01

    The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...

  2. Intranode data communications in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2014-01-07

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  3. Intranode data communications in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

    2013-07-23

    Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

  4. Parallel Computing in SCALE

    International Nuclear Information System (INIS)

    DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  5. Parallel Polarization State Generation.

    Science.gov (United States)

    She, Alan; Capasso, Federico

    2016-05-17

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  6. Parallelizing More Loops with Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    2012-01-01

    an interactive compilation feedback system that guides programmers in iteratively modifying their application source code. This helps leverage the compiler’s ability to generate loop-parallel code. We employ our system to modify two sequential benchmarks dealing with image processing and edge detection...

  7. Partitions in languages and parallel computations

    Energy Technology Data Exchange (ETDEWEB)

    Burgin, M S; Burgina, E S

    1982-05-01

    Partitions of entries (linguistic structures) are studied that are intended for parallel data processing. The representations of formal languages with the aid of such structures is examined, and the relationships are considered between partitions of entries and abstract families of languages and automata. 18 references.

  8. Parallel object-oriented specification language

    NARCIS (Netherlands)

    Florescu, O.; Voeten, J.P.M.; Theelen, B.D.; Geilen, M.C.W.; Corporaal, H.; Burns, Alan

    2008-01-01

    The Parallel Object-Oriented Specification Language (POOSL) is an expressive modelling language for hardware/software systems [10]. It was originally defined in [7] as an object-oriented extension of process algebra CCS [6], supporting (conditional) synchronous message passing between

  9. A Model for Speedup of Parallel Programs

    Science.gov (United States)

    1997-01-01

    Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static

  10. Mathematical Abstraction: Constructing Concept of Parallel Coordinates

    Science.gov (United States)

    Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.

    2017-09-01

    Mathematical abstraction is an important process in teaching and learning mathematics so pre-service mathematics teachers need to understand and experience this process. One of the theoretical-methodological frameworks for studying this process is Abstraction in Context (AiC). Based on this framework, abstraction process comprises of observable epistemic actions, Recognition, Building-With, Construction, and Consolidation called as RBC + C model. This study investigates and analyzes how pre-service mathematics teachers constructed and consolidated concept of Parallel Coordinates in a group discussion. It uses AiC framework for analyzing mathematical abstraction of a group of pre-service teachers consisted of four students in learning Parallel Coordinates concepts. The data were collected through video recording, students’ worksheet, test, and field notes. The result shows that the students’ prior knowledge related to concept of the Cartesian coordinate has significant role in the process of constructing Parallel Coordinates concept as a new knowledge. The consolidation process is influenced by the social interaction between group members. The abstraction process taken place in this group were dominated by empirical abstraction that emphasizes on the aspect of identifying characteristic of manipulated or imagined object during the process of recognizing and building-with.

  11. A Massively Parallel Face Recognition System

    Directory of Open Access Journals (Sweden)

    Lahdenoja Olli

    2007-01-01

    Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.

  12. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  13. A Massively Parallel Face Recognition System

    Directory of Open Access Journals (Sweden)

    Ari Paasio

    2006-12-01

    Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.

  14. Parallel processor for fast event analysis

    International Nuclear Information System (INIS)

    Hensley, D.C.

    1983-01-01

    Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system

  15. Structured building model reduction toward parallel simulation

    Energy Technology Data Exchange (ETDEWEB)

    Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University

    2013-08-26

    Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

  16. Discrete Hadamard transformation algorithm's parallelism analysis and achievement

    Science.gov (United States)

    Hu, Hui

    2009-07-01

    With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.

  17. Parallelization of ITOUGH2 using PVM

    International Nuclear Information System (INIS)

    Finsterle, Stefan

    1998-01-01

    ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM

  18. Parallel optoelectronic trinary signed-digit division

    Science.gov (United States)

    Alam, Mohammad S.

    1999-03-01

    The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

  19. Parallel Monte Carlo reactor neutronics

    International Nuclear Information System (INIS)

    Blomquist, R.N.; Brown, F.B.

    1994-01-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved

  20. Anti-parallel triplexes

    DEFF Research Database (Denmark)

    Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.

    2015-01-01

    about 6.1 °C when the TFO strand was modified with Z and the Watson-Crick strand with adenine-LNA (AL). The molecular modeling results showed that, in case of nucleobases Y and Z a hydrogen bond (1.69 and 1.72 Å, respectively) was formed between the protonated 3-aminopropyn-1-yl chain and one...... of the phosphate groups in Watson-Crick strand. Also, it was shown that the nucleobase Y made a good stacking and binding with the other nucleobases in the TFO and Watson-Crick duplex, respectively. In contrast, the nucleobase Z with LNA moiety was forced to twist out of plane of Watson-Crick base pair which......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...

  1. Parallel consensual neural networks.

    Science.gov (United States)

    Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H

    1997-01-01

    A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.

  2. Effects of parallel planning on agreement production.

    Science.gov (United States)

    Veenstra, Alma; Meyer, Antje S; Acheson, Daniel J

    2015-11-01

    An important issue in current psycholinguistics is how the time course of utterance planning affects the generation of grammatical structures. The current study investigated the influence of parallel activation of the components of complex noun phrases on the generation of subject-verb agreement. Specifically, the lexical interference account (Gillespie & Pearlmutter, 2011b; Solomon & Pearlmutter, 2004) predicts more agreement errors (i.e., attraction) for subject phrases in which the head and local noun mismatch in number (e.g., the apple next to the pears) when nouns are planned in parallel than when they are planned in sequence. We used a speeded picture description task that yielded sentences such as the apple next to the pears is red. The objects mentioned in the noun phrase were either semantically related or unrelated. To induce agreement errors, pictures sometimes mismatched in number. In order to manipulate the likelihood of parallel processing of the objects and to test the hypothesized relationship between parallel processing and the rate of agreement errors, the pictures were either placed close together or far apart. Analyses of the participants' eye movements and speech onset latencies indicated slower processing of the first object and stronger interference from the related (compared to the unrelated) second object in the close than in the far condition. Analyses of the agreement errors yielded an attraction effect, with more errors in mismatching than in matching conditions. However, the magnitude of the attraction effect did not differ across the close and far conditions. Thus, spatial proximity encouraged parallel processing of the pictures, which led to interference of the associated conceptual and/or lexical representation, but, contrary to the prediction, it did not lead to more attraction errors. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. A Parallel Particle Swarm Optimizer

    National Research Council Canada - National Science Library

    Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D

    2003-01-01

    .... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...

  4. Patterns for Parallel Software Design

    CERN Document Server

    Ortega-Arjona, Jorge Luis

    2010-01-01

    Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin

  5. Seeing or moving in parallel

    DEFF Research Database (Denmark)

    Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo

    2013-01-01

    a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...

  6. Parallelization methods study of thermal-hydraulics codes

    International Nuclear Information System (INIS)

    Gaudart, Catherine

    2000-01-01

    The variety of parallelization methods and machines leads to a wide selection for programmers. In this study we suggest, in an industrial context, some solutions from the experience acquired through different parallelization methods. The study is about several scientific codes which simulate a large variety of thermal-hydraulics phenomena. A bibliography on parallelization methods and a first analysis of the codes showed the difficulty of our process on the whole applications to study. Therefore, it would be necessary to identify and extract a representative part of these applications and parallelization methods. The linear solver part of the codes forced itself. On this particular part several parallelization methods had been used. From these developments one could estimate the necessary work for a non initiate programmer to parallelize his application, and the impact of the development constraints. The different methods of parallelization tested are the numerical library PETSc, the parallelizer PAF, the language HPF, the formalism PEI and the communications library MPI and PYM. In order to test several methods on different applications and to follow the constraint of minimization of the modifications in codes, a tool called SPS (Server of Parallel Solvers) had be developed. We propose to describe the different constraints about the optimization of codes in an industrial context, to present the solutions given by the tool SPS, to show the development of the linear solver part with the tested parallelization methods and lastly to compare the results against the imposed criteria. (author) [fr

  7. IMPLEMENTATION OF SERIAL AND PARALLEL BUBBLE SORT ON FPGA

    Directory of Open Access Journals (Sweden)

    Dwi Marhaendro Jati Purnomo

    2016-06-01

    Full Text Available Sorting is common process in computational world. Its utilization are on many fields from research to industry. There are many sorting algorithm in nowadays. One of the simplest yet powerful is bubble sort. In this study, bubble sort is implemented on FPGA. The implementation was taken on serial and parallel approach. Serial and parallel bubble sort then compared by means of its memory, execution time, and utility which comprises slices and LUTs. The experiments show that serial bubble sort required smaller memory as well as utility compared to parallel bubble sort. Meanwhile, parallel bubble sort performed faster than serial bubble sort

  8. PRISMA/DB: A Parallel Main-Memory Relational DBMS

    NARCIS (Netherlands)

    Apers, Peter M.G.; Flokstra, Jan; van den Berg, Carel A.; Grefen, P.W.P.J.; Wilschut, A.N.; Kersten, Martin L.; van den Berg, C.A.

    1992-01-01

    PRISMA/DB, a full-fledged parallel, main memory relational database management system (DBMS) is described. PRISMA/DB's high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. A flexible architecture for experimenting with

  9. Parallel and Serial Grouping of Image Elements in Visual Perception

    Science.gov (United States)

    Houtkamp, Roos; Roelfsema, Pieter R.

    2010-01-01

    The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some…

  10. A high-speed linear algebra library with automatic parallelism

    Science.gov (United States)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  11. Comparison of some parallelization strategies of thermalhydraulic codes on GPUs

    International Nuclear Information System (INIS)

    Jendoubi, T.; Bergeaud, V.; Geay, A.

    2013-01-01

    Modern supercomputers architecture is now often based on hybrid concepts combining parallelism to distributed memory, parallelism to shared memory and also to GPUs (Graphic Process Units). In this work, we propose a new approach to take advantage of these graphic cards in thermohydraulics algorithms. (authors)

  12. Duality-based algorithms for scheduling on unrelated parallel machines

    NARCIS (Netherlands)

    van de Velde, S.L.; van de Velde, S.L.

    1993-01-01

    We consider the following parallel machine scheduling problem. Each of n independent jobs has to be scheduled on one of m unrelated parallel machines. The processing of job J[sub l] on machine Mi requires an uninterrupted period of positive length p[sub lj]. The objective is to find an assignment of

  13. Parallelism Effects and Verb Activation: The Sustained Reactivation Hypothesis

    Science.gov (United States)

    Callahan, Sarah M.; Shapiro, Lewis P.; Love, Tracy

    2010-01-01

    This study investigated the processes underlying parallelism by evaluating the activation of a parallel element (i.e., a verb) throughout "and"-coordinated sentences. Four points were tested: (1) approximately 1,600ms after the verb in the first conjunct (PP1), (2) immediately following the conjunction (PP2), (3) approximately 1,100ms after the…

  14. PARALLEL IMPORT: REALITY FOR RUSSIA

    Directory of Open Access Journals (Sweden)

    Т. А. Сухопарова

    2014-01-01

    Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now

  15. Integrated Task And Data Parallel Programming: Language Design

    Science.gov (United States)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  16. The Galley Parallel File System

    Science.gov (United States)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  17. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  18. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  19. Parallel Computing for Brain Simulation.

    Science.gov (United States)

    Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A

    2017-01-01

    The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. Configuration affects parallel stent grafting results.

    Science.gov (United States)

    Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

    2018-05-01

    A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31

  1. A parallel solution for high resolution histological image analysis.

    Science.gov (United States)

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  2. Parallel VLSI Architecture

    Science.gov (United States)

    Truong, T. K.; Reed, I.; Yeh, C.; Shao, H.

    1985-01-01

    Fermat number transformation convolutes two digital data sequences. Very-large-scale integration (VLSI) applications, such as image and radar signal processing, X-ray reconstruction, and spectrum shaping, linear convolution of two digital data sequences of arbitrary lenghts accomplished using Fermat number transform (ENT).

  3. Frontiers of massively parallel scientific computation

    International Nuclear Information System (INIS)

    Fischer, J.R.

    1987-07-01

    Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented

  4. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  5. Is Monte Carlo embarrassingly parallel?

    Energy Technology Data Exchange (ETDEWEB)

    Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  6. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  7. Parallel integer sorting with medium and fine-scale parallelism

    Science.gov (United States)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  8. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  9. A Soft Parallel Kinematic Mechanism.

    Science.gov (United States)

    White, Edward L; Case, Jennifer C; Kramer-Bottiglio, Rebecca

    2018-02-01

    In this article, we describe a novel holonomic soft robotic structure based on a parallel kinematic mechanism. The design is based on the Stewart platform, which uses six sensors and actuators to achieve full six-degree-of-freedom motion. Our design is much less complex than a traditional platform, since it replaces the 12 spherical and universal joints found in a traditional Stewart platform with a single highly deformable elastomer body and flexible actuators. This reduces the total number of parts in the system and simplifies the assembly process. Actuation is achieved through coiled-shape memory alloy actuators. State observation and feedback is accomplished through the use of capacitive elastomer strain gauges. The main structural element is an elastomer joint that provides antagonistic force. We report the response of the actuators and sensors individually, then report the response of the complete assembly. We show that the completed robotic system is able to achieve full position control, and we discuss the limitations associated with using responsive material actuators. We believe that control demonstrated on a single body in this work could be extended to chains of such bodies to create complex soft robots.

  10. Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

    Science.gov (United States)

    Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

    2011-04-01

    To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

  11. Parallel Monte Carlo simulation of aerosol dynamics

    KAUST Repository

    Zhou, K.

    2014-01-01

    A highly efficient Monte Carlo (MC) algorithm is developed for the numerical simulation of aerosol dynamics, that is, nucleation, surface growth, and coagulation. Nucleation and surface growth are handled with deterministic means, while coagulation is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI). The parallel computing efficiency is investigated through numerical examples. Near 60% parallel efficiency is achieved for the maximum testing case with 3.7 million MC particles running on 93 parallel computing nodes. The algorithm is verified through simulating various testing cases and comparing the simulation results with available analytical and/or other numerical solutions. Generally, it is found that only small number (hundreds or thousands) of MC particles is necessary to accurately predict the aerosol particle number density, volume fraction, and so forth, that is, low order moments of the Particle Size Distribution (PSD) function. Accurately predicting the high order moments of the PSD needs to dramatically increase the number of MC particles. 2014 Kun Zhou et al.

  12. Performance studies of the parallel VIM code

    International Nuclear Information System (INIS)

    Shi, B.; Blomquist, R.N.

    1996-01-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs

  13. Parallel generation of architecture on the GPU

    KAUST Repository

    Steinberger, Markus

    2014-05-01

    In this paper, we present a novel approach for the parallel evaluation of procedural shape grammars on the graphics processing unit (GPU). Unlike previous approaches that are either limited in the kind of shapes they allow, the amount of parallelism they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies required for context-sensitive evaluation, and introduce intra-rule parallelism. Our rule scheduling scheme avoids unnecessary back and forth between CPU and GPU and reduces round trips to slow global memory by dynamically grouping rules in on-chip shared memory. Our GPU shape grammar implementation is multiple orders of magnitude faster than the standard in CPU-based rule evaluation, while offering equal expressive power. In comparison to the state of the art in GPU shape grammar derivation, our approach is nearly 50 times faster, while adding support for geometric context-sensitivity. © 2014 The Author(s) Computer Graphics Forum © 2014 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.

  14. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    Science.gov (United States)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  15. Asynchronous Parallelization of a CFD Solver

    OpenAIRE

    Abdi, Daniel S.; Bitsuamlak, Girma T.

    2015-01-01

    The article of record as published may be found at http://dx.doi.org/10.1155/2015/295393 A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used inCFDcodes; however, it has a potential to alleviate scaling bottlenecks incurred due to process...

  16. A Topological Model for Parallel Algorithm Design

    Science.gov (United States)

    1991-09-01

    effort should be directed to planning, requirements analysis, specification and design, with 20% invested into the actual coding, and then the final 40...be olle more language to learn. And by investing the effort into improving the utility of ai, existing language instead of creating a new one, this...193) it abandons the notion of a process as a fundemental concept of parallel program design and that it facilitates program derivation by rigorously

  17. Parallel education: what is it?

    OpenAIRE

    Amos, Michelle Peta

    2017-01-01

    In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...

  18. Balanced, parallel operation of flashlamps

    International Nuclear Information System (INIS)

    Carder, B.M.; Merritt, B.T.

    1979-01-01

    A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests

  19. Parallel Ada benchmarks for the SVMS

    Science.gov (United States)

    Collard, Philippe E.

    1990-01-01

    The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.

  20. Intelligent spatial ecosystem modeling using parallel processors

    International Nuclear Information System (INIS)

    Maxwell, T.; Costanza, R.

    1993-01-01

    Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change