parallel processing perspective: Topics by WorldWideScience.org

Sample records for parallel processing perspective

Massively Parallel Computing: A Sandia Perspective

Energy Technology Data Exchange (ETDEWEB)

Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

1999-05-06

The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Oxytocin: parallel processing in the social brain?

Science.gov (United States)

Dölen, Gül

2015-06-01

Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.
Parallelism and array processing

International Nuclear Information System (INIS)

Zacharov, V.

1983-01-01

Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
Parallel processing for fluid dynamics applications

International Nuclear Information System (INIS)

Johnson, G.M.

1989-01-01

The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Parallel Framework for Cooperative Processes

Directory of Open Access Journals (Sweden)

Mitică Craus

2005-01-01

Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Parallel processing for artificial intelligence 1

CERN Document Server

Kanal, LN; Kumar, V; Suttner, CB

1994-01-01

Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus
Z-buffer image assembly processing in high parallel visualization processing

International Nuclear Information System (INIS)

Kaneko, Isamu; Muramatsu, Kazuhiro

2000-03-01

On the platform of the parallel computer with many processors, the domain decomposition method is used as a popular means of parallel processing. In these days when the simulation scale becomes much larger and takes a lot of time, the simultaneous visualization processing with the actual computation is much more needed, and especially in case of a real-time visualization, the domain decomposition technique is indispensable. In case of parallel rendering processing, the rendered results must be gathered to one processor to compose the integrated picture in the last stage. This integration is usually conducted by the method using Z-buffer values. This process, however, induces the crucial problems of much lower speed processing and local memory shortage in case of parallel processing exceeding more than several tens of processors. In this report, the two new solutions are proposed. The one is the adoption of a special operator (Reduce operator) in the parallelization process, and the other is a buffer compression by deleting the background informations. This report includes the performance results of these new techniques to investigate their effect with use of the parallel computer Paragon. (author)
Researching the Parallel Process in Supervision and Psychotherapy

DEFF Research Database (Denmark)

Jacobsen, Claus Haugaard

Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out.......Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out....
Advanced parallel processing with supercomputer architectures

International Nuclear Information System (INIS)

Hwang, K.

1987-01-01

This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2008-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2009-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...
Reconstruction for Time-Domain In Vivo EPR 3D Multigradient Oximetric Imaging—A Parallel Processing Perspective

Directory of Open Access Journals (Sweden)

Christopher D. Dharmaraj

2009-01-01

Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

Science.gov (United States)

Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

2009-01-01

Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
A qualitative single case study of parallel processes

DEFF Research Database (Denmark)

Jacobsen, Claus Haugaard

2007-01-01

Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel processing of two-dimensional Sn transport calculations

International Nuclear Information System (INIS)

Uematsu, M.

1997-01-01

A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation
Parallel processing of structural integrity analysis codes

International Nuclear Information System (INIS)

Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

1996-01-01

Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Applications of Parallel Processing in Mobile Banking

Directory of Open Access Journals (Sweden)

2007-01-01

Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Spatially parallel processing of within-dimension conjunctions.

Science.gov (United States)

Linnell, K J; Humphreys, G W

2001-01-01

Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.

Efficient multitasking: parallel versus serial processing of multiple tasks.

Science.gov (United States)

Fischer, Rico; Plessow, Franziska

2015-01-01

In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
Parallel processing of Monte Carlo code MCNP for particle transport problem

Energy Technology Data Exchange (ETDEWEB)

Higuchi, Kenji; Kawasaki, Takuji

1996-06-01

It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
The Multi-perspective Process Explorer

NARCIS (Netherlands)

Mannhardt, Felix; De Leoni, Massimiliano; Reijers, Hajo A.

2015-01-01

Organizations use process mining techniques to analyze event data recorded by their information systems. Multi-perspective process mining techniques make use of data attributes attached to events to analyze processes from multiple perspectives. Applying those multi-perspective process mining
Parallel-Processing Test Bed For Simulation Software

Science.gov (United States)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
The study of image processing of parallel digital signal processor

International Nuclear Information System (INIS)

Liu Jie

2000-01-01

The author analyzes the basic characteristic of parallel DSP (digital signal processor) TMS320C80 and proposes related optimized image algorithm and the parallel processing method based on parallel DSP. The realtime for many image processing can be achieved in this way
A multitransputer parallel processing system (MTPPS)

International Nuclear Information System (INIS)

Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

1993-01-01

This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes
Parallel workflow tools to facilitate human brain MRI post-processing

Directory of Open Access Journals (Sweden)

Zaixu eCui

2015-05-01

Full Text Available Multi-modal magnetic resonance imaging (MRI techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues.
Parallel finite elements with domain decomposition and its pre-processing

International Nuclear Information System (INIS)

Yoshida, A.; Yagawa, G.; Hamada, S.

1993-01-01

This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)
Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

Directory of Open Access Journals (Sweden)

GORGUNOGLU, S.

2014-05-01

Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.
Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

Directory of Open Access Journals (Sweden)

Harald Michalik

2009-02-01

Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.
Parallel processing of neutron transport in fuel assembly calculation

International Nuclear Information System (INIS)

Song, Jae Seung

1992-02-01

Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's
Parallel processing for artificial intelligence 2

CERN Document Server

Kumar, V; Suttner, CB

1994-01-01

With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
Study and simulation of a parallel numerical processing machine

International Nuclear Information System (INIS)

Bel Hadj, Slaheddine

1981-12-01

This study has been carried out in the perspective of the implementation on a minicomputer of the NEPTUNIX package (software for the resolution of very large algebra-differential equation systems). Aiming at increasing the system performance, a previous research work has shown the necessity of reducing the execution time of certain numerical computation tasks, which are of frequent use. It has also demonstrated the feasibility of handling these tasks with efficient algorithms of parallel type. The present work deals with the study and simulation of a parallel architecture processor adapted to the fast execution of these algorithms. A minicomputer fitted with a connection to such a parallel processor, has a greatly extended computing power. Then the architecture of a parallel numerical processor, based on the use of VLSI microprocessors and co-processors, is described. Its design aims at the best cost / performance ratio. The last part deals with the simulation processor with the 'CHAMBOR' program. Results show an increasing factor of 30 in speed, in comparison with the execution on a MITRA 15 minicomputer. Moreover the conflicts importance, mainly at the level of access to a shared resource is evaluated. Although this implementation has been designed having in mind a dedicated application, other uses could be envisaged, particularly for the simulation of nuclear reactors: operator guiding system, the behavioural study under accidental circumstances, etc. (author) [fr
Investigation of Mediational Processes Using Parallel Process Latent Growth Curve Modeling

Science.gov (United States)

Cheong, JeeWon; MacKinnon, David P.; Khoo, Siek Toon

2010-01-01

This study investigated a method to evaluate mediational processes using latent growth curve modeling. The mediator and the outcome measured across multiple time points were viewed as 2 separate parallel processes. The mediational process was defined as the independent variable influencing the growth of the mediator, which, in turn, affected the growth of the outcome. To illustrate modeling procedures, empirical data from a longitudinal drug prevention program, Adolescents Training and Learning to Avoid Steroids, were used. The program effects on the growth of the mediator and the growth of the outcome were examined first in a 2-group structural equation model. The mediational process was then modeled and tested in a parallel process latent growth curve model by relating the prevention program condition, the growth rate factor of the mediator, and the growth rate factor of the outcome. PMID:20157639
The Acoustic and Peceptual Effects of Series and Parallel Processing

Directory of Open Access Journals (Sweden)

Melinda C. Anderson

2009-01-01

Full Text Available Temporal envelope (TE cues provide a great deal of speech information. This paper explores how spectral subtraction and dynamic-range compression gain modifications affect TE fluctuations for parallel and series configurations. In parallel processing, algorithms compute gains based on the same input signal, and the gains in dB are summed. In series processing, output from the first algorithm forms the input to the second algorithm. Acoustic measurements show that the parallel arrangement produces more gain fluctuations, introducing more changes to the TE than the series configurations. Intelligibility tests for normal-hearing (NH and hearing-impaired (HI listeners show (1 parallel processing gives significantly poorer speech understanding than an unprocessed (UNP signal and the series arrangement and (2 series processing and UNP yield similar results. Speech quality tests show that UNP is preferred to both parallel and series arrangements, although spectral subtraction is the most preferred. No significant differences exist in sound quality between the series and parallel arrangements, or between the NH group and the HI group. These results indicate that gain modifications affect intelligibility and sound quality differently. Listeners appear to have a higher tolerance for gain modifications with regard to intelligibility, while judgments for sound quality appear to be more affected by smaller amounts of gain modification.
Test generation for digital circuits using parallel processing

Science.gov (United States)

Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

1990-12-01

The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

DEFF Research Database (Denmark)

Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng

2017-01-01

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled...... solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches....
The Process Perspective on Trust in Organizations

DEFF Research Database (Denmark)

Jagd, Søren

Several trust researchers have pointed to the relevance of the process perspective for understanding trust in organizations. In this paper I review the attempts to apply the process perspective in trust research distinguishing between a soft and a strong process perspective. It appears that nearly......’s history? A few examples of studies applying a strong process perspective to trust in organizations is discussed. Summarising, I argue that the process perspective, in general, may contribute furthering the understanding of the complexities of the development of trust over time and that the strong process...
A Novel Least Significant Bit First Processing Parallel CRC Circuit

Directory of Open Access Journals (Sweden)

Xiujie Qu

2013-01-01

Full Text Available In HDLC serial communication protocol, CRC calculation can first process the most or least significant bit of data. Nowadays most CRC calculation is based on the most significant bit (MSB first processing. An algorithm of the least significant bit (LSB first processing parallel CRC is proposed in this paper. Based on the general expression of the least significant bit first processing serial CRC, using state equation method of linear system, we derive a recursive formula by the mathematical deduction. The recursive formula is applicable to any number of bits processed in parallel and any series of generator polynomial. According to the formula, we present the parallel circuit of CRC calculation and implement it with VHDL on FPGA. The results verify the accuracy and effectiveness of this method.
Evidence of Parallel Processing During Translation

DEFF Research Database (Denmark)

Balling, Laura Winther; Hvelplund, Kristian Tangsgaard; Sjørup, Annette Camilla

2014-01-01

conclude that translation is a parallel process and that literal translation is likely to be a universal initial default strategy in translation. This conclusion is strengthened by the fact that all three experiments were relatively naturalistic, due to the combination of remote eye tracking and mixed...

Advanced optical signal processing of broadband parallel data signals

DEFF Research Database (Denmark)

Oxenløwe, Leif Katsuo; Hu, Hao; Kjøller, Niels-Kristian

2016-01-01

Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration.......Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration....
Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2016-03-15

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Parallel and distributed processing in power system simulation and control

Energy Technology Data Exchange (ETDEWEB)

Falcao, Djalma M [Universidade Federal, Rio de Janeiro, RJ (Brazil). Coordenacao dos Programas de Pos-graduacao de Engenharia

1994-12-31

Recent advances in computer technology will certainly have a great impact in the methodologies used in power system expansion and operational planning as well as in real-time control. Parallel and distributed processing are among the new technologies that present great potential for application in these areas. Parallel computers use multiple functional or processing units to speed up computation while distributed processing computer systems are collection of computers joined together by high speed communication networks having many objectives and advantages. The paper presents some ideas for the use of parallel and distributed processing in power system simulation and control. It also comments on some of the current research work in these topics and presents a summary of the work presently being developed at COPPE. (author) 53 refs., 2 figs.
Visual analysis of inter-process communication for large-scale parallel computing.

Science.gov (United States)

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Fast image processing on parallel hardware

International Nuclear Information System (INIS)

Bittner, U.

1988-01-01

Current digital imaging modalities in the medical field incorporate parallel hardware which is heavily used in the stage of image formation like the CT/MR image reconstruction or in the DSA real time subtraction. In order to image post-processing as efficient as image acquisition, new software approaches have to be found which take full advantage of the parallel hardware architecture. This paper describes the implementation of two-dimensional median filter which can serve as an example for the development of such an algorithm. The algorithm is analyzed by viewing it as a complete parallel sort of the k pixel values in the chosen window which leads to a generalization to rank order operators and other closely related filters reported in literature. A section about the theoretical base of the algorithm gives hints for how to characterize operations suitable for implementations on pipeline processors and the way to find the appropriate algorithms. Finally some results that computation time and usefulness of medial filtering in radiographic imaging are given
Leveraging Parallel Data Processing Frameworks with Verified Lifting

Directory of Open Access Journals (Sweden)

Maaz Bin Safeer Ahmad

2016-11-01

Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.
What is "the patient perspective" in patient engagement programs? Implicit logics and parallels to feminist theories.

Science.gov (United States)

Rowland, Paula; McMillan, Sarah; McGillicuddy, Patti; Richards, Joy

2017-01-01

Public and patient involvement (PPI) in health care may refer to many different processes, ranging from participating in decision-making about one's own care to participating in health services research, health policy development, or organizational reforms. Across these many forms of public and patient involvement, the conceptual and theoretical underpinnings remain poorly articulated. Instead, most public and patient involvement programs rely on policy initiatives as their conceptual frameworks. This lack of conceptual clarity participates in dilemmas of program design, implementation, and evaluation. This study contributes to the development of theoretical understandings of public and patient involvement. In particular, we focus on the deployment of patient engagement programs within health service organizations. To develop a deeper understanding of the conceptual underpinnings of these programs, we examined the concept of "the patient perspective" as used by patient engagement practitioners and participants. Specifically, we focused on the way this phrase was used in the singular: "the" patient perspective or "the" patient voice. From qualitative analysis of interviews with 20 patient advisers and 6 staff members within a large urban health network in Canada, we argue that "the patient perspective" is referred to as a particular kind of situated knowledge, specifically an embodied knowledge of vulnerability. We draw parallels between this logic of patient perspective and the logic of early feminist theory, including the concepts of standpoint theory and strong objectivity. We suggest that champions of patient engagement may learn much from the way feminist theorists have constructed their arguments and addressed critique.
A model for dealing with parallel processes in supervision

Directory of Open Access Journals (Sweden)

Lilja Cajvert

2011-03-01

Supervision in social work is essential for successful outcomes when working with clients. In social work, unconscious difficulties may arise and similar difficulties may occur in supervision as parallel processes. In this article, the development of a practice-based model of supervision to deal with parallel processes in supervision is described. The model has six phases. In the first phase, the focus is on the supervisor’s inner world, his/her own reflections and observations. In the second phase, the supervision situation is “frozen”, and the supervisees are invited to join the supervisor in taking a meta-perspective on the current situation of supervision. The focus in the third phase is on the inner world of all the group members as well as the visualization and identification of reflections and feelings that arose during the supervision process. Phase four focuses on the supervisee who presented a case, and in phase five the focus shifts to the common understanding and theorization of the supervision process as well as the definition and identification of possible parallel processes. In the final phase, the supervisee, with the assistance of the supervisor and other members of the group, develops a solution and determines how to proceed with the client in treatment. This article uses phenomenological concepts to provide a theoretical framework for the supervision model. Phenomenological reduction is an important approach to examine and to externalize and visualize the inner words of the supervisor and supervisees. Een model voor het hanteren van parallelle processen tijdens supervisie Om succesvol te zijn in de hulpverlening aan cliënten, is supervisie cruciaal in het sociaal werk. Tijdens de hulpverlening kunnen impliciete moeilijkheden de kop opsteken en soortgelijke moeilijkheden duiken soms ook op tijdens supervisie. Dit worden parallelle processen genoemd. Dit artikel beschrijft een op praktijkervaringen gebaseerd model om dergelijke parallelle
Parallel processing approach to transform-based image coding

Science.gov (United States)

Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

1991-06-01

This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
Decomposition based parallel processing technique for efficient collaborative optimization

International Nuclear Information System (INIS)

Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

2000-01-01

In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology
An Educational Tool for Interactive Parallel and Distributed Processing

DEFF Research Database (Denmark)

Pagliarini, Luigi; Lund, Henrik Hautop

2011-01-01

In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation of the abs......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...
Abstract processing and observer vantage perspective in dysphoria.

Science.gov (United States)

Hart-Smith, Ly; Moulds, Michelle L

2018-05-07

processing and observer vantage perspective have been associated with negative consequences in depression. We investigated the relationship between mode of processing and vantage perspective bidirectionally in high and low dysphoric individuals, using abstract and concrete descriptions of experimenter-provided everyday actions. When vantage perspective was manipulated and processing mode was measured (Study 1a), participants who adopted a field perspective did not differ from those who adopted an observer perspective in their preference for abstract descriptions, irrespective of dysphoria status. When processing mode was manipulated and vantage perspective was measured (Study 1b), participants provided with abstract descriptions had a greater tendency to adopt an observer perspective than those provided with concrete descriptions, irrespective of dysphoria status. These results were replicated in larger online samples (Studies 2a and 2b). Together, they indicate a unidirectional causal relationship, whereby processing mode causally influences vantage perspective, in contrast to the bidirectional relationship previously reported in an unselected sample (Libby, Shaeffer, & Eibach, 2009). Further, these findings demonstrate that abstract processing increases the likelihood of adopting an observer perspective, and support targeting abstract processing in the treatment of depression to address the negative consequences associated with both abstract processing and recalling/imagining events from an observer perspective. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Parallel and distributed processing: applications to power systems

Energy Technology Data Exchange (ETDEWEB)

Wu, Felix; Murphy, Liam [California Univ., Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

1994-12-31

Applications of parallel and distributed processing to power systems problems are still in the early stages. Rapid progress in computing and communications promises a revolutionary increase in the capacity of distributed processing systems. In this paper, the state-of-the art in distributed processing technology and applications is reviewed and future trends are discussed. (author) 14 refs.,1 tab.
Parallel factor analysis PARAFAC of process affected water

Energy Technology Data Exchange (ETDEWEB)

Ewanchuk, A.M.; Ulrich, A.C.; Sego, D. [Alberta Univ., Edmonton, AB (Canada). Dept. of Civil and Environmental Engineering; Alostaz, M. [Thurber Engineering Ltd., Calgary, AB (Canada)

2010-07-01

A parallel factor analysis (PARAFAC) of oil sands process-affected water was presented. Naphthenic acids (NA) are traditionally described as monobasic carboxylic acids. Research has indicated that oil sands NA do not fit classical definitions of NA. Oil sands organic acids have toxic and corrosive properties. When analyzed by fluorescence technology, oil sands process-affected water displays a characteristic peak at 290 nm excitation and approximately 346 nm emission. In this study, a parallel factor analysis (PARAFAC) was used to decompose process-affected water multi-way data into components representing analytes, chemical compounds, and groups of compounds. Water samples from various oil sands operations were analyzed in order to obtain EEMs. The EEMs were then arranged into a large matrix in decreasing process-affected water content for PARAFAC. Data were divided into 5 components. A comparison with commercially prepared NA samples suggested that oil sands NA is fundamentally different. Further research is needed to determine what each of the 5 components represent. tabs., figs.
Parallel processing based decomposition technique for efficient collaborative optimization

International Nuclear Information System (INIS)

Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

2001-01-01

In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology
Parallel asynchronous systems and image processing algorithms

Science.gov (United States)

Coon, D. D.; Perera, A. G. U.

1989-01-01

A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.
The role of parallelism in the real-time processing of anaphora.

Science.gov (United States)

Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P

2012-06-01

Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution.
The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

Energy Technology Data Exchange (ETDEWEB)

Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

1996-03-01

The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

Science.gov (United States)

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Heterogeneous Multicore Parallel Programming for Graphics Processing Units

Directory of Open Access Journals (Sweden)

Francois Bodin

2009-01-01

Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

Science.gov (United States)

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Aspects of parallel processing and control engineering

OpenAIRE

McKittrick, Brendan J

1991-01-01

The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-pro...
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

2014-11-11

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.
Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

International Nuclear Information System (INIS)

Choi, Sung Hoon; Joo, Han Gyu

2009-01-01

Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated
Method of parallel processing in SANPO real time system

International Nuclear Information System (INIS)

Ostrovnoj, A.I.; Salamatin, I.M.

1981-01-01

A method of parellel processing in SANPO real time system is described. Algorithms of data accumulation and preliminary processing in this system as a parallel processes using a specialized high level programming language are described. Hierarchy of elementary processes are also described. It provides the synchronization of concurrent processes without semaphors. The developed means are applied to the systems of experiment automation using SM-3 minicomputers [ru
Regional-scale calculation of the LS factor using parallel processing

Science.gov (United States)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Real-time SHVC software decoding with multi-threaded parallel processing

Science.gov (United States)

Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

2014-09-01

This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
Application of parallel processing for automatic inspection of printed circuits

International Nuclear Information System (INIS)

Lougheed, R.M.

1986-01-01

Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given
Linear parallel processing machines I

Energy Technology Data Exchange (ETDEWEB)

Von Kunze, M

1984-01-01

As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.
Parallel Careers and their Consequences for Companies in Brazil

Directory of Open Access Journals (Sweden)

Maria Candida Baumer Azevedo

2014-04-01

Full Text Available Given the relevance of the need to manage parallel careers to attract and retain people in organizations, this paper provides insight into this phenomenon from an organizational perspective. The parallel career concept, introduced by Alboher (2007 and recently addressed by Schuiling (2012, has previously been examined only from the perspective of the parallel career holder (PC holder. The paper provides insight from both individual and organizational perspectives on the phenomenon of parallel careers and considers how it can function as an important tool for attracting and retaining people by contributing to human development. This paper employs a qualitative approach that includes 30 semi-structured one-on-one interviews. The organizational perspective arises from the 15 interviews with human resources (HR executives from different companies. The individual viewpoint originates from the interviews with 15 executives who are also PC holders. An inductive content analysis approach was used to examine Brazilian companies and the Brazilian office of multinationals. Companies that are concerned about having the best talent on their teams can benefit from a deeper understanding of parallel careers, which can be used to attract, develop, and retain talent. Limitations and directions for future research are discussed.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

Science.gov (United States)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
Digital intermediate frequency QAM modulator using parallel processing

Science.gov (United States)

Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

2008-05-27

The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.
Parallel processing is good for your scientific codes...But massively parallel processing is so much better

International Nuclear Information System (INIS)

Thomas, B.; Domain, Ch.; Souffez, Y.; Eon-Duval, P.

1998-01-01

Harnessing the power of many computers, to solve concurrently difficult scientific problems, is one of the most innovative trend in High Performance Computing. At EDF, we have invested in parallel computing and have achieved significant results. First we improved the processing speed of strategic codes, in order to extend their scope. Then we turned to numerical simulations at the atomic scale. These computations, we never dreamt of before, provided us with a better understanding of metallurgic phenomena. More precisely we were able to trace defects in alloys that are used in nuclear power plants. (author)
Adapting high-level language programs for parallel processing using data flow

Science.gov (United States)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Design of a family of integrated parallel co-processors for images processing

International Nuclear Information System (INIS)

Court, Thierry

1991-01-01

The design of parallel image processing Systems joining in a same architecture, sophisticated microprocessors and specialised operators is a difficult task, because of the various problems to be taken into account. The current study identifies a certain way of realizing and interfacing such dedicated operators to a central unit with microprocessor type. The two guide lines of this work are the search for polyvalent specialized and re-configurated operators as well as their connections to a System bus, and not to specialized video buses. This research work proposes a certain architecture of circuits dedicated to image processing and two realization proposals of them. One of them was be realized in this study by using silicon compiler tools. This work belongs to a more important project, whose aim is the development of an industrial image processing System, high performing, modular, based on the parallelization, in MIMD structures, of an elementary, autonomous image processing unit integrating a microprocessor equipped with a parallel coprocessor suited to image processing. (author) [fr
An educational tool for interactive parallel and distributed processing

DEFF Research Database (Denmark)

Pagliarini, Luigi; Lund, Henrik Hautop

2012-01-01

In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...
Parallel Processing of Images in Mobile Devices using BOINC

Science.gov (United States)

Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

2018-04-01

Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.
Parallel Processing of Images in Mobile Devices using BOINC

Directory of Open Access Journals (Sweden)

Curiel Mariela

2018-04-01

Full Text Available Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.
A new decomposition method for parallel processing multi-level optimization

International Nuclear Information System (INIS)

Park, Hyung Wook; Kim, Min Soo; Choi, Dong Hoon

2002-01-01

In practical designs, most of the multidisciplinary problems have a large-size and complicate design system. Since multidisciplinary problems have hundreds of analyses and thousands of variables, the grouping of analyses and the order of the analyses in the group affect the speed of the total design cycle. Therefore, it is very important to reorder and regroup the original design processes in order to minimize the total computational cost by decomposing large multidisciplinary problems into several MultiDisciplinary Analysis SubSystems (MDASS) and by processing them in parallel. In this study, a new decomposition method is proposed for parallel processing of multidisciplinary design optimization, such as Collaborative Optimization (CO) and Individual Discipline Feasible (IDF) method. Numerical results for two example problems are presented to show the feasibility of the proposed method
Parallel transaction processing in functional languages, towards practical functional databases

NARCIS (Netherlands)

Wevers, L.; Huisman, Marieke; de Keijzer, Ander

2013-01-01

This paper shows how functional languages can be adapted for transaction processing, and discusses the implementation of a parallel runtime system for such functional transaction processing languages. We extend functional languages with current state variables and result state variables to allow the

Parallel processing data network of master and slave transputers controlled by a serial control network

Science.gov (United States)

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Parallel processing method for high-speed real time digital pulse processing for gamma-ray spectroscopy

International Nuclear Information System (INIS)

Fernandes, A.M.; Pereira, R.C.; Sousa, J.; Neto, A.; Carvalho, P.; Batista, A.J.N.; Carvalho, B.B.; Varandas, C.A.F.; Tardocchi, M.; Gorini, G.

2010-01-01

A new data acquisition (DAQ) system was developed to fulfil the requirements of the gamma-ray spectrometer (GRS) JET-EP2 (joint European Torus enhancement project 2), providing high-resolution spectroscopy at very high-count rate (up to few MHz). The system is based on the Advanced Telecommunications Computing Architecture TM (ATCA TM ) and includes a transient record (TR) module with 8 channels of 14 bits resolution at 400 MSamples/s (MSPS) sampling rate, 4 GB of local memory, and 2 field programmable gate array (FPGA) able to perform real time algorithms for data reduction and digital pulse processing. Although at 400 MSPS only fast programmable devices such as FPGAs can be used either for data processing and data transfer, FPGA resources also present speed limitation at some specific tasks, leading to an unavoidable data lost when demanding algorithms are applied. To overcome this problem and foreseeing an increase of the algorithm complexity, a new digital parallel filter was developed, aiming to perform real time pulse processing in the FPGAs of the TR module at the presented sampling rate. The filter is based on the conventional digital time-invariant trapezoidal shaper operating with parallelized data while performing pulse height analysis (PHA) and pile up rejection (PUR). The incoming sampled data is successively parallelized and fed into the processing algorithm block at one fourth of the sampling rate. The following data processing and data transfer is also performed at one fourth of the sampling rate. The algorithm based on data parallelization technique was implemented and tested at JET facilities, where a spectrum was obtained. Attending to the observed results, the PHA algorithm will be improved by implementing the pulse pile up discrimination.
A learnable parallel processing architecture towards unity of memory and computing.

Science.gov (United States)

Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

2015-08-14

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing

Science.gov (United States)

Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

2015-08-01

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

Science.gov (United States)

Read, S J; Vanman, E J; Miller, L C

1997-01-01

We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.
Load balancing in highly parallel processing of Monte Carlo code for particle transport

International Nuclear Information System (INIS)

Higuchi, Kenji; Takemiya, Hiroshi; Kawasaki, Takuji

1998-01-01

In parallel processing of Monte Carlo (MC) codes for neutron, photon and electron transport problems, particle histories are assigned to processors making use of independency of the calculation for each particle. Although we can easily parallelize main part of a MC code by this method, it is necessary and practically difficult to optimize the code concerning load balancing in order to attain high speedup ratio in highly parallel processing. In fact, the speedup ratio in the case of 128 processors remains in nearly one hundred times when using the test bed for the performance evaluation. Through the parallel processing of the MCNP code, which is widely used in the nuclear field, it is shown that it is difficult to attain high performance by static load balancing in especially neutron transport problems, and a load balancing method, which dynamically changes the number of assigned particles minimizing the sum of the computational and communication costs, overcomes the difficulty, resulting in nearly fifteen percentage of reduction for execution time. (author)
Parallel processing from applications to systems

CERN Document Server

Moldovan, Dan I

1993-01-01

This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla
Toward a model framework of generalized parallel componential processing of multi-symbol numbers.

Science.gov (United States)

Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

2015-05-01

In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining and investigating a sign-decade compatibility effect for the comparison of positive and negative numbers, which extends the unit-decade compatibility effect in 2-digit number processing. Then, we evaluated whether the model is capable of accounting for previous findings in negative number processing. In a magnitude comparison task, in which participants had to single out the larger of 2 integers, we observed a reliable sign-decade compatibility effect with prolonged reaction times for incompatible (e.g., -97 vs. +53; in which the number with the larger decade digit has the smaller, i.e., negative polarity sign) as compared with sign-decade compatible number pairs (e.g., -53 vs. +97). Moreover, an analysis of participants' eye fixation behavior corroborated our model of parallel componential processing of multi-symbol numbers. These results are discussed in light of concurrent theoretical notions about negative number processing. On the basis of the present results, we propose a generalized integrated model framework of parallel componential multi-symbol processing. (c) 2015 APA, all rights reserved).
Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

International Nuclear Information System (INIS)

Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

2009-01-01

Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Science.gov (United States)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Surface topography of parallel grinding process for nonaxisymmetric aspheric lens

International Nuclear Information System (INIS)

Zhang Ningning; Wang Zhenzhong; Pan Ri; Wang Chunjin; Guo Yinbiao

2012-01-01

Workpiece surface profile, texture and roughness can be predicted by modeling the topography of wheel surface and modeling kinematics of grinding process, which compose an important part of precision grinding process theory. Parallel grinding technology is an important method for nonaxisymmetric aspheric lens machining, but there is few report on relevant simulation. In this paper, a simulation method based on parallel grinding for precision machining of aspheric lens is proposed. The method combines modeling the random surface of wheel and modeling the single grain track based on arc wheel contact points. Then, a mathematical algorithm for surface topography is proposed and applied in conditions of different machining parameters. The consistence between the results of simulation and test proves that the algorithm is correct and efficient. (authors)
A New Tool for Intelligent Parallel Processing of Radar/SAR Remotely Sensed Imagery

Directory of Open Access Journals (Sweden)

A. Castillo Atoche

2013-01-01

Full Text Available A novel parallel tool for large-scale image enhancement/reconstruction and postprocessing of radar/SAR sensor systems is addressed. The proposed parallel tool performs the following intelligent processing steps: image formation, for the application of different system-level effects of image degradation with a particular remote sensing (RS system and simulation of random noising effects, enhancement/reconstruction by employing nonparametric robust high-resolution techniques, and image postprocessing using the fuzzy anisotropic diffusion technique which incorporates a better edge-preserving noise removal effect and faster diffusion process. This innovative tool allows the processing of high-resolution images provided with different radar/SAR sensor systems as required by RS endusers for environmental monitoring, risk prevention, and resource management. To verify the performance implementation of the proposed parallel framework, the processing steps are developed and specifically tested on graphic processing units (GPU, achieving considerable speedups compared to the serial version of the same techniques implemented in C language.
Processing optimization with parallel computing for the J-PET scanner

Directory of Open Access Journals (Sweden)

Krzemień Wojciech

2015-12-01

Full Text Available The Jagiellonian Positron Emission Tomograph (J-PET collaboration is developing a prototype time of flight (TOF-positron emission tomograph (PET detector based on long polymer scintillators. This novel approach exploits the excellent time properties of the plastic scintillators, which permit very precise time measurements. The very fast field programmable gate array (FPGA-based front-end electronics and the data acquisition system, as well as low- and high-level reconstruction algorithms were specially developed to be used with the J-PET scanner. The TOF-PET data processing and reconstruction are time and resource demanding operations, especially in the case of a large acceptance detector that works in triggerless data acquisition mode. In this article, we discuss the parallel computing methods applied to optimize the data processing for the J-PET detector. We begin with general concepts of parallel computing and then we discuss several applications of those techniques in the J-PET data processing.
Implementation of parallel processing in the basf2 framework for Belle II

International Nuclear Information System (INIS)

Itoh, Ryosuke; Lee, Soohyung; Katayama, N; Mineo, S; Moll, A; Kuhr, T; Heck, M

2012-01-01

Recent PC servers are equipped with multi-core CPUs and it is desired to utilize the full processing power of them for the data analysis in large scale HEP experiments. A software framework basf2 is being developed for the use in the Belle II experiment, a new generation B-factory experiment at KEK, and the parallel event processing to utilize the multi-core CPUs is in its design for the use in the massive data production. The details of the implementation of event parallel processing in the basf2 framework are discussed with the report of preliminary performance study in the realistic use on a 32 core PC server.
Age-related perspectives and emotion processing.

Science.gov (United States)

Lynchard, Nicholas A; Radvansky, Gabriel A

2012-12-01

Emotion is processed differently in younger and older adults. Older adults show a positivity effect, whereas younger adults show a negativity effect. Socioemotional selectivity theory suggests that these effects can be elicited in any age group when age-related perspectives are manipulated. To examine this, younger and older adults were oriented to actual and age-contrasting possible selves. Emotion activations were assessed using lexical decision. In line with socioemotional selectivity theory, shifts in emotion orientation varied according to perspective, with both younger and older adults showing a negativity effect when a younger adult perspective was taken and a positivity effect when an older adult perspective was taken. 2013 APA, all rights reserved
A dataflow analysis tool for parallel processing of algorithms

Science.gov (United States)

Jones, Robert L., III

1993-01-01

A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on a set of identical parallel processors. Typical applications include signal processing and control law problems. Graph analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool is shown to facilitate the application of the design process to a given problem.
The Processing of Somatosensory Information shifts from an early parallel into a serial processing mode: a combined fMRI/MEG study.

Directory of Open Access Journals (Sweden)

Carsten Michael Klingner

2016-12-01

Full Text Available The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG data collected during sustained (260 ms tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII receives information regarding a new stimulus in parallel with the primary somatosensory area (SI, whereas later processing in the SII is dominated by the preprocessed input from the SI.
The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study.

Science.gov (United States)

Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W

2016-01-01

The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.
Partitioning sparse rectangular matrices for parallel processing

Energy Technology Data Exchange (ETDEWEB)

Kolda, T.G.

1998-05-01

The authors are interested in partitioning sparse rectangular matrices for parallel processing. The partitioning problem has been well-studied in the square symmetric case, but the rectangular problem has received very little attention. They will formalize the rectangular matrix partitioning problem and discuss several methods for solving it. They will extend the spectral partitioning method for symmetric matrices to the rectangular case and compare this method to three new methods -- the alternating partitioning method and two hybrid methods. The hybrid methods will be shown to be best.
Managing internode data communications for an uninitialized process in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.

Parallel processing of genomics data

Science.gov (United States)

Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

2016-10-01

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Parallels between a Collaborative Research Process and the Middle Level Philosophy

Science.gov (United States)

Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen

2014-01-01

The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…
Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

DEFF Research Database (Denmark)

Srba, Jirí

2002-01-01

We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

OpenAIRE

Givelberg, Edward

2014-01-01

We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...
Exploring the Potential of Dynamic Perspective Taking on Business Processes

Directory of Open Access Journals (Sweden)

Florian Krenn

2016-10-01

Full Text Available Although many organizations have started to work with business process models in their operational practice, they have not explored the entire potential of intertwining business process modeling with organizational development. Process specifications contain workflows that require execution, in order to achieve business objectives and support business operation effectively. With the advent of Subject-oriented and Social Business Process Management, communication and stakeholder interaction have become novel perspectives on how to design and implement processes. They go beyond formal responsibilities encoded in functional roles, and are not very common across organizational hierarchies. However, stakeholders, including organizational developers and IT specialists, can be supported looking at processes and their execution from either perspective, namely, from a traditional one, focusing on functions and task accomplishment, and from an interactional perspective, focusing on communication among stakeholders and system interactions. The introduced dual-mode workflow execution engine UeberFlow allows considering both perspectives during process runtime, thus, checking operational completeness from either perspective. Stakeholders can start modeling with a perspective they are familiar with and subsequently proceed with the another one by switching dynamically to an alternate mode of execution. The presented meta-model and architecture of such a dual mode support tool enables coupling business process management directly with organizational development.
Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

2010-01-01

Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.
Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

Science.gov (United States)

Mack, Harold; Reddi, S. S.

1980-04-01

Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.
Parallel Algorithm of Geometrical Hashing Based on NumPy Package and Processes Pool

Directory of Open Access Journals (Sweden)

Klyachin Vladimir Aleksandrovich

2015-10-01

Full Text Available The article considers the problem of multi-dimensional geometric hashing. The paper describes a mathematical model of geometric hashing and considers an example of its use in localization problems for the point. A method of constructing the corresponding hash matrix by parallel algorithm is considered. In this paper an algorithm of parallel geometric hashing using a development pattern «pool processes» is proposed. The implementation of the algorithm is executed using the Python programming language and NumPy package for manipulating multidimensional data. To implement the process pool it is proposed to use a class Process Pool Executor imported from module concurrent.futures, which is included in the distribution of the interpreter Python since version 3.2. All the solutions are presented in the paper by corresponding UML class diagrams. Designed GeomNash package includes classes Data, Result, GeomHash, Job. The results of the developed program presents the corresponding graphs. Also, the article presents the theoretical justification for the application process pool for the implementation of parallel algorithms. It is obtained condition t2 > (p/(p-1*t1 of the appropriateness of process pool. Here t1 - the time of transmission unit of data between processes, and t2 - the time of processing unit data by one processor.
Effects of visual information regarding allocentric processing in haptic parallelity matching.

Science.gov (United States)

Van Mier, Hanneke I

2013-10-01

Research has revealed that haptic perception of parallelity deviates from physical reality. Large and systematic deviations have been found in haptic parallelity matching most likely due to the influence of the hand-centered egocentric reference frame. Providing information that increases the influence of allocentric processing has been shown to improve performance on haptic matching. In this study allocentric processing was stimulated by providing informative vision in haptic matching tasks that were performed using hand- and arm-centered reference frames. Twenty blindfolded participants (ten men, ten women) explored the orientation of a reference bar with the non-dominant hand and subsequently matched (task HP) or mirrored (task HM) its orientation on a test bar with the dominant hand. Visual information was provided by means of informative vision with participants having full view of the test bar, while the reference bar was blocked from their view (task VHP). To decrease the egocentric bias of the hands, participants also performed a visual haptic parallelity drawing task (task VHPD) using an arm-centered reference frame, by drawing the orientation of the reference bar. In all tasks, the distance between and orientation of the bars were manipulated. A significant effect of task was found; performance improved from task HP, to VHP to VHPD, and HM. Significant effects of distance were found in the first three tasks, whereas orientation and gender effects were only significant in tasks HP and VHP. The results showed that stimulating allocentric processing by means of informative vision and reducing the egocentric bias by using an arm-centered reference frame led to most accurate performance on parallelity matching. © 2013 Elsevier B.V. All rights reserved.
The “Axial” (“Vanishing Axis” Perspective

Directory of Open Access Journals (Sweden)

Daniel Sofron

2015-11-01

Full Text Available The present paper approaches the axial perspective, a method of spatial representation that precedes the invention of the Renaissance geometrical perspective. Despite being typical to ancient Greek and Roman art, the axial perspective can also be identified during the Middle Ages and the early Renaissance period and it represents the first form of systematic convergence of parallel lines. At the same time, the paper presents Erwin Panofsky's theories on this spatial suggestion method. Trying to offer it a scientific foundation, the researcher builds a system that he calls "the vanishing axis perspective" and puts forward a series of arguments in favour of the existence of such a perspective. Although the axial perspectival constructions imply awkward superimpositions of planes that might seem geometrically inaccurate, this method of spatial structuring of the image constitutes an important stage in the process of identifying solutions for the faithful reproduction of concrete reality and an essential stepin the development process of thevanishing point perspective.
Fast phase processing in off-axis holography by CUDA including parallel phase unwrapping.

Science.gov (United States)

Backoach, Ohad; Kariv, Saar; Girshovitz, Pinhas; Shaked, Natan T

2016-02-22

We present parallel processing implementation for rapid extraction of the quantitative phase maps from off-axis holograms on the Graphics Processing Unit (GPU) of the computer using computer unified device architecture (CUDA) programming. To obtain efficient implementation, we parallelized both the wrapped phase map extraction algorithm and the two-dimensional phase unwrapping algorithm. In contrast to previous implementations, we utilized unweighted least squares phase unwrapping algorithm that better suits parallelism. We compared the proposed algorithm run times on the CPU and the GPU of the computer for various sizes of off-axis holograms. Using the GPU implementation, we extracted the unwrapped phase maps from the recorded off-axis holograms at 35 frames per second (fps) for 4 mega pixel holograms, and at 129 fps for 1 mega pixel holograms, which presents the fastest processing framerates obtained so far, to the best of our knowledge. We then used common-path off-axis interferometric imaging to quantitatively capture the phase maps of a micro-organism with rapid flagellum movements.
Tuning of tool dynamics for increased stability of parallel (simultaneous) turning processes

Science.gov (United States)

Ozturk, E.; Comak, A.; Budak, E.

2016-01-01

Parallel (simultaneous) turning operations make use of more than one cutting tool acting on a common workpiece offering potential for higher productivity. However, dynamic interaction between the tools and workpiece and resulting chatter vibrations may create quality problems on machined surfaces. In order to determine chatter free cutting process parameters, stability models can be employed. In this paper, stability of parallel turning processes is formulated in frequency and time domain for two different parallel turning cases. Predictions of frequency and time domain methods demonstrated reasonable agreement with each other. In addition, the predicted stability limits are also verified experimentally. Simulation and experimental results show multi regional stability diagrams which can be used to select most favorable set of process parameters for higher stable material removal rates. In addition to parameter selection, developed models can be used to determine the best natural frequency ratio of tools resulting in the highest stable depth of cuts. It is concluded that the most stable operations are obtained when natural frequency of the tools are slightly off each other and worst stability occurs when the natural frequency of the tools are exactly the same.
Tolerating correlated failures in Massively Parallel Stream Processing Engines

DEFF Research Database (Denmark)

Su, L.; Zhou, Y.

2016-01-01

Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the o......Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...
Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

Science.gov (United States)

Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

2013-01-01

Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…
Introduction to parallel programming

CERN Document Server

Brawer, Steven

1989-01-01

Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

DEFF Research Database (Denmark)

Srba, Jirí

2002-01-01

We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...... regularity (finiteness) checking of BPP. This implies that strong regularity of BPP is also PSPACE-hard....
Processing communications events in parallel active messaging interface by awakening thread from wait state

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-22

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
The Extended Parallel Process Model: Illuminating the Gaps in Research

Science.gov (United States)

Popova, Lucy

2012-01-01

This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

Science.gov (United States)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Personality and self-regulation: trait and information-processing perspectives.

Science.gov (United States)

Hoyle, Rick H

2006-12-01

This article introduces the special issue of Journal of Personality on personality and self-regulation. The goal of the issue is to illustrate and inspire research that integrates personality and process-oriented accounts of self-regulation. The article begins by discussing the trait perspective on self-regulation--distinguishing between temperament and personality accounts--and the information-processing perspective. Three approaches to integrating these perspectives are then presented. These range from methodological approaches, in which constructs representing the two perspectives are examined in integrated statistical models, to conceptual approaches, in which the two perspectives are unified in a holistic theoretical model of self-regulation. The article concludes with an overview of the special issue contributions, which are organized in four sections: broad, integrative models of personality and self-regulation; models that examine the developmental origins of self-regulation and self-regulatory styles; focused programs of research that concern specific aspects or applications of self-regulation; and strategies for increasing the efficiency and effectiveness of self-regulation.

Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision

Science.gov (United States)

Hinkle, Michelle Gimenez

2008-01-01

This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.
Parallel processing and non-uniform grids in global air quality modeling

NARCIS (Netherlands)

Berkvens, P.J.F.; Bochev, Mikhail A.

2002-01-01

A large-scale global air quality model, running efficiently on a single vector processor, is enhanced to make more realistic and more long-term simulations feasible. Two strategies are combined: non-uniform grids and parallel processing. The communication through the hierarchy of non-uniform grids
Recent development for the ITS code system: Parallel processing and visualization

International Nuclear Information System (INIS)

Fan, W.C.; Turner, C.D.; Halbleib, J.A. Sr.; Kensek, R.P.

1996-01-01

A brief overview is given for two software developments related to the ITS code system. These developments provide parallel processing and visualization capabilities and thus allow users to perform ITS calculations more efficiently. Timing results and a graphical example are presented to demonstrate these capabilities
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing

Science.gov (United States)

Arifin, Y.; Isa, S. M.; Wulandhari, L. A.; Abdurachman, E.

2018-03-01

The plagiarism has many forms, not only copy paste but include changing passive become active voice, or paraphrasing without appropriate acknowledgment. It happens on all language include Indonesian Language. There are many previous research that related with plagiarism detection in Indonesian Language with different method. But there are still some part that still has opportunity to improve. This research proposed the solution that can improve the plagiarism detection technique that can detect not only copy paste form but more advance than that. The proposed solution is using Winnowing with some addition process in pre-processing stage. With stemming processing in Indonesian Language and generate fingerprint in parallel processing that can saving time processing and produce the plagiarism result on the suspected document.
Multi states electromechanical switch for energy efficient parallel data processing

KAUST Repository

Kloub, Hussam; Smith, Casey; Hussain, Muhammad Mustafa

2011-01-01

We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
Multi states electromechanical switch for energy efficient parallel data processing

KAUST Repository

Kloub, Hussam

2011-04-01

We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

DEFF Research Database (Denmark)

Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

2008-01-01

This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

Science.gov (United States)

Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

2014-12-23

Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.
Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

International Nuclear Information System (INIS)

Hicks, D.L.

1983-11-01

In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references
Understanding decimal proportions: discrete representations, parallel access, and privileged processing of zero.

Science.gov (United States)

Varma, Sashank; Karl, Stacy R

2013-05-01

Much of the research on mathematical cognition has focused on the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9, with considerably less attention paid to more abstract number classes. The current research investigated how people understand decimal proportions--rational numbers between 0 and 1 expressed in the place-value symbol system. The results demonstrate that proportions are represented as discrete structures and processed in parallel. There was a semantic interference effect: When understanding a proportion expression (e.g., "0.29"), both the correct proportion referent (e.g., 0.29) and the incorrect natural number referent (e.g., 29) corresponding to the visually similar natural number expression (e.g., "29") are accessed in parallel, and when these referents lead to conflicting judgments, performance slows. There was also a syntactic interference effect, generalizing the unit-decade compatibility effect for natural numbers: When comparing two proportions, their tenths and hundredths components are processed in parallel, and when the different components lead to conflicting judgments, performance slows. The results also reveal that zero decimals--proportions ending in zero--serve multiple cognitive functions, including eliminating semantic interference and speeding processing. The current research also extends the distance, semantic congruence, and SNARC effects from natural numbers to decimal proportions. These findings inform how people understand the place-value symbol system, and the mental implementation of mathematical symbol systems more generally. Copyright © 2013 Elsevier Inc. All rights reserved.
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU
Image processing with massively parallel computer Quadrics Q1

International Nuclear Information System (INIS)

Della Rocca, A.B.; La Porta, L.; Ferriani, S.

1995-05-01

Aimed to evaluate the image processing capabilities of the massively parallel computer Quadrics Q1, a convolution algorithm that has been implemented is described in this report. At first the discrete convolution mathematical definition is recalled together with the main Q1 h/w and s/w features. Then the different codification forms of the algorythm are described and the Q1 performances are compared with those obtained by different computers. Finally, the conclusions report on main results and suggestions
Application of parallel computing to seismic damage process simulation of an arch dam

International Nuclear Information System (INIS)

Zhong Hong; Lin Gao; Li Jianbo

2010-01-01

The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.
Gestalt and Adventure Therapy: Parallels and Perspectives.

Science.gov (United States)

Gilsdorf, Rudiger

This paper calls attention to parallels in the literature of adventure education and that of Gestalt therapy, demonstrating that both are rooted in an experiential tradition. The philosophies of adventure or experiential education and Gestalt therapy have the following areas in common: (1) emphasis on personal growth and the development of present…
The Masterson Approach with play therapy: a parallel process between mother and child.

Science.gov (United States)

Mulherin, M A

2001-01-01

This paper discusses a case in which the Masterson Approach was used with play therapy to treat a child with a developing personality disorder. It describes the parallel progression of the child and mother in adjunct therapy throughout a six-year period. The unique value of the Masterson Approach is that it provides the therapist with a framework and tool to diagnose and treat a child during the dynamic process of play. The case describes the mother-child dyad throughout therapy. It traces their parallel processes that involve separation, individuation, rapprochement, and the recovery of real self-capacities. Each stage of treatment is described, including verbal interventions. The child's internal affective state and intrapsychic structure during the various stages of treatment are illustrated by representative pictures.
Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

Science.gov (United States)

Doerner, Edgardo

2016-05-01

In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.
An Inconvenient Truth: An Application of the Extended Parallel Process Model

Science.gov (United States)

Goodall, Catherine E.; Roberto, Anthony J.

2008-01-01

"An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…
Distributed system for parallel data processing of ECT signals for electromagnetic flaw detection in materials

International Nuclear Information System (INIS)

Guliashki, Vassil; Marinova, Galia

2002-01-01

The paper proposes a distributed system for parallel data processing of ECT signals for flaw detection in materials. The measured data are stored in files on a host computer, where a JAVA server is located. The host computer is connected through Internet to a set of client computers, distributed geographically. The data are distributed from the host computer by means of the JAVA server to the client computers according their requests. The software necessary for the data processing is installed on each client computer in advance. The organization of the data processing on many computers, working simultaneously in parallel, leads to great time reducing, especially in cases when huge amount of data should be processed in very short time. (Author)

Parallel-hierarchical processing and classification of laser beam profile images based on the GPU-oriented architecture

Science.gov (United States)

Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan

2017-08-01

The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.
Parallel Distributed Processing theory in the age of deep networks

OpenAIRE

Bowers, Jeffrey

2017-01-01

Parallel Distributed Processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely, that all knowledge is coded in a distributed format, and cognition is mediated by non-symbolic computations. These claims have long been debated within cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks le...
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

Directory of Open Access Journals (Sweden)

Loredana MOCEAN

2009-01-01

Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Density functional theory and parallel processing

International Nuclear Information System (INIS)

Ward, R.C.; Geist, G.A.; Butler, W.H.

1987-01-01

The authors demonstrate a method for obtaining the ground state energies and charge densities of a system of atoms described within density functional theory using simulated annealing on a parallel computer
What is a good public participation process? Five perspectives from the public.

Science.gov (United States)

Webler, T; Tuler, S; Krueger, R

2001-03-01

It is now widely accepted that members of the public should be involved in environmental decision-making. This has inspired many to search for principles that characterize good public participation processes. In this paper we report on a study that identifies discourses about what defines a good process. Our case study was a forest planning process in northern New England and New York. We employed Q methodology to learn how participants characterize a good process differently, by selecting, defining, and privileging different principles. Five discourses, or perspectives, about good process emerged from our study. One perspective emphasizes that a good process acquires and maintains popular legitimacy. A second sees a good process as one that facilitates an ideological discussion. A third focuses on the fairness of the process. A fourth perspective conceptualizes participatory processes as a power struggle--in this instance a power play between local land-owning interests and outsiders. A fifth perspective highlights the need for leadership and compromise. Dramatic differences among these views suggest an important challenge for those responsible for designing and carrying out public participation processes. Conflicts may emerge about process designs because people disagree about what is good in specific contexts.
Organizational Change Perspectives on Software Process Improvement

DEFF Research Database (Denmark)

Müller, Sune Dueholm; Mathiassen, Lars; Balshøj, Hans Henrik

Many software organizations have engaged in Software Process Improvement (SPI) and experienced the challenges related to managing such complex organizational change efforts. As a result, there is an increasing body of research investigating change management in SPI. To provide an overview of what......, and brain perspectives. Practitioners may use these articles as a guide to SPI insights relevant to their improvement initiatives. In contrast, the impact of culture, dominance, psychic prison, flux and transformation, and politics in SPI have only received scant attention. We argue that these perspectives...
Exact stationary state for an asymmetric exclusion process with fully parallel dynamics

NARCIS (Netherlands)

Gier, J.C.|info:eu-repo/dai/nl/170218430; Nienhuis, B.

The exact stationary state of an asymmetric exclusion process with fully parallel dynamics is obtained using the matrix product ansatz. We give a simple derivation for the deterministic case by a physical interpretation of the dimension of the matrices. We prove the stationarity via a cancellation
Performance of MPI parallel processing implemented by MCNP5/ MCNPX for criticality benchmark problems

International Nuclear Information System (INIS)

Mark Dennis Usang; Mohd Hairie Rabir; Mohd Amin Sharifuldin Salleh; Mohamad Puad Abu

2012-01-01

MPI parallelism are implemented on a SUN Workstation for running MCNPX and on the High Performance Computing Facility (HPC) for running MCNP5. 23 input less obtained from MCNP Criticality Validation Suite are utilized for the purpose of evaluating the amount of speed up achievable by using the parallel capabilities of MPI. More importantly, we will study the economics of using more processors and the type of problem where the performance gain are obvious. This is important to enable better practices of resource sharing especially for the HPC facilities processing time. Future endeavours in this direction might even reveal clues for best MCNP5/ MCNPX coding practices for optimum performance of MPI parallelisms. (author)
Hardware system of parallel processing for fast CT image reconstruction based on circular shifting float memory architecture

International Nuclear Information System (INIS)

Wang Shi; Kang Kejun; Wang Jingjin

1995-01-01

Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture and hardware design of PIRS-4 (4-processor Parallel Image Reconstruction System) which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes structure and component of the system, the design of cross bar switch and details of control model. The test results are described
Understanding the IT/business partnership - a business process perspective

DEFF Research Database (Denmark)

Siurdyban, Artur

2014-01-01

From a business process perspective, the business value of information technologies (IT) stems from how they improve or enable business processes. At the same time, in the field of strategic IT/business alignment, the locus of discussion has been how IT/business partnerships enhance the value of IT....... Despite this apparent relationship, the business process perspective has been absent from the IT/business alignment discussion. In this paper, we use the case of an industrial company to develop a model for understanding IT/business partnerships in business process terms. Based on our findings, we define...... these partnerships by allocating responsibilities between central IT and the local business during two stages of a process lifecycle: formation and standardization. The significance of the findings lies in how the model’s configuration leads to different types of IT units’ process centricity. This in turn affects...
The concept of parallel input/output processing for an electron linac

International Nuclear Information System (INIS)

Emoto, Takashi

1993-01-01

The instrumentation of and the control system for the PNC 10 MeV CW electron linac are described. A new concept of parallel input/output processing for the linac has been introduced. It is based on a substantial number of input/output processors(IOP) using beam control and diagnostics. The flexibility and simplicity of hardware/software are significant advantages with this scheme. (author)
Emotion response coherence: A dual-process perspective

NARCIS (Netherlands)

Evers, C.; Hopp, H.; Gross, J.J.; Fischer, A.H.; Manstead, A.S.R.; Mauss, I.B.

2014-01-01

Emotions are widely thought to involve coordinated responses across multiple responses (e.g., experiential, behavioral, and physiological). However, empirical support for this general "response coherence" postulate is inconsistent. The present research takes a dual-process perspective, suggesting
Emotion response coherence : a dual-process perspective

NARCIS (Netherlands)

Evers, Catharine; Hopp, Henrik; Gross, James J; Fischer, Agneta H; Manstead, Antony S R; Mauss, Iris B

Emotions are widely thought to involve coordinated responses across multiple responses (e.g., experiential, behavioral, and physiological). However, empirical support for this general "response coherence" postulate is inconsistent. The present research takes a dual-process perspective, suggesting
Parallel asynchronous hardware implementation of image processing algorithms

Science.gov (United States)

Coon, Darryl D.; Perera, A. G. U.

1990-01-01

Research is being carried out on hardware for a new approach to focal plane processing. The hardware involves silicon injection mode devices. These devices provide a natural basis for parallel asynchronous focal plane image preprocessing. The simplicity and novel properties of the devices would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture built from arrays of the devices would form a two-dimensional (2-D) array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuron-like asynchronous pulse-coded form through the laminar processor. No multiplexing, digitization, or serial processing would occur in the preprocessing state. High performance is expected, based on pulse coding of input currents down to one picoampere with noise referred to input of about 10 femtoamperes. Linear pulse coding has been observed for input currents ranging up to seven orders of magnitude. Low power requirements suggest utility in space and in conjunction with very large arrays. Very low dark current and multispectral capability are possible because of hardware compatibility with the cryogenic environment of high performance detector arrays. The aforementioned hardware development effort is aimed at systems which would integrate image acquisition and image processing.
Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

Science.gov (United States)

Rogers, Timothy T.; McClelland, James L.

2014-01-01

This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…
Morphological evidence for parallel processing of information in rat macula

Science.gov (United States)

Ross, M. D.

1988-01-01

Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control

Science.gov (United States)

Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming

2017-09-01

Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
Level 2 Perspective Taking Entails Two Processes: Evidence from PRP Experiments

Science.gov (United States)

Janczyk, Markus

2013-01-01

In many situations people need to mentally adopt the (spatial) perspective of other persons, an ability that is referred to as "Level 2 perspective taking." Its underlying processes have been ascribed to mental self-rotation that can be dissociated from mental object-rotation. Recent findings suggest that perspective taking/self-rotation…
Fear Control an Danger Control: A Test of the Extended Parallel Process Model (EPPM).

Science.gov (United States)

Witte, Kim

1994-01-01

Explores cognitive and emotional mechanisms underlying success and failure of fear appeals in context of AIDS prevention. Offers general support for Extended Parallel Process Model. Suggests that cognitions lead to fear appeal success (attitude, intention, or behavior changes) via danger control processes, whereas the emotion fear leads to fear…
Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets

Science.gov (United States)

Verma, R.

2017-06-01

We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.

Fraud Detection in Credit Card Transactions; Using Parallel Processing of Anomalies in Big Data

Directory of Open Access Journals (Sweden)

Mohammad Reza Taghva

2016-10-01

Full Text Available In parallel to the increasing use of electronic cards, especially in the banking industry, the volume of transactions using these cards has grown rapidly. Moreover, the financial nature of these cards has led to the desirability of fraud in this area. The present study with Map Reduce approach and parallel processing, applied the Kohonen neural network model to detect abnormalities in bank card transactions. For this purpose, firstly it was proposed to classify all transactions into the fraudulent and legal which showed better performance compared with other methods. In the next step, we transformed the Kohonen model into the form of parallel task which demonstrated appropriate performance in terms of time; as expected to be well implemented in transactions with Big Data assumptions.
Parallel processing at the SSC: The fact and the fiction

International Nuclear Information System (INIS)

Bourianoff, G.; Cole, B.

1991-10-01

Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a ''thin element'' analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations (''FLOP''). For the SSC collider lattice consisting of 10 4 elements, performing a tracking of study for a set of 100 particles for 10 7 turns would require 2 x 10 15 FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 x 10 7 seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

Science.gov (United States)

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
Parallel algorithms for mapping pipelined and parallel computations

Science.gov (United States)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Evaluating parallel optimization on transputers

Directory of Open Access Journals (Sweden)

A.G. Chalmers

2003-12-01

Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.
When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias

OpenAIRE

Trippas, Dries; Thompson, Valerie A.; Handley, Simon J.

2016-01-01

Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was ...
A Review of Parallel Processing Approaches to Robot Kinematics and Jacobian

OpenAIRE

Henrich, Dominik; Karl, Joachim; Wörn, Heinz

1997-01-01

Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and class...
Data structures and languages in support of parallel image processing for astronomy

International Nuclear Information System (INIS)

Tanimoto, S.L.

1985-01-01

This paper discusses data structures, and aspects of programming languages and systems that are relevant to image processing of astronomy data. Emphasis is on image processing computations, because this kind of data processing is obviously a ripe one for parallelism and is important in astronomy. However, some discussion of general possibilities are also presented. The role of algorithms is examined since they are not dependent on a particular language. As an implementation of an algorithm a program is equally tied to data structure, operations, architecture and language, and therefore the issue of programming resides in the center of the tetrahedron
The parallel processing impact in the optimization of the reactors neutronic by genetic algorithms

International Nuclear Information System (INIS)

Pereira, Claudio M.N.A.; Universidade Federal, Rio de Janeiro, RJ; Lapa, Celso M.F.; Mol, Antonio C.A.

2002-01-01

Nowadays, many optimization problems found in nuclear engineering has been solved through genetic algorithms (GA). The robustness of such methods is strongly related to the nature of search process which is based on populations of solution candidates, and this fact implies high computational cost in the optimization process. The use of GA become more critical when the evaluation process of a solution candidate is highly time consuming. Problems of this nature are common in the nuclear engineering, and an example is the reactor design optimization, where neutronic codes, which consume high CPU time, must be run. Aiming to investigate the impact of the use of parallel computation in the solution, through GA, of a reactor design optimization problem, a parallel genetic algorithm (PGA), using the Island Model, was developed. Exhaustive experiments, then 1500 processing hours in 550 MHz personal computers, have been done, in order to compare the conventional GA with the PGA. Such experiments have demonstrating the superiority of the PGA not only in terms of execution time, but also, in the optimization results. (author)
Design of parallel intersector weld/cut robot for machining processes in ITER vacuum vessel

International Nuclear Information System (INIS)

Wu Huapeng; Handroos, Heikki; Kovanen, Janne; Rouvinen, Asko; Hannukainen, Petri; Saira, Tanja; Jones, Lawrence

2003-01-01

This paper presents a new parallel robot Penta-WH, which has five degrees of freedom driven by hydraulic cylinders. The manipulator has a large, singularity-free workspace and high stiffness and it acts as a transport device for welding, machining and inspection end-effectors inside the ITER vacuum vessel. The presented kinematic structure of a parallel robot is particularly suitable for the ITER environment. Analysis of the machining process for ITER, such as the machining methods and forces are given, and the kinematic analyses, such as workspace and force capacity are discussed
Resistance to awareness of the supervisor's transferences with special reference to the parallel process.

Science.gov (United States)

Stimmel, B

1995-06-01

Supervision is an essential part of psychoanalytic education. Although not taken for granted, it is not studied with the same critical eye as is the analytic process. This paper examines the supervision specifically with a focus on the supervisor's transference towards the supervisee. The point is made, in the context of clinical examples, that one of the ways these transference reactions may be rationalised is within the setting of the parallel process so often encountered in supervision. Parallel process, a very familiar term, is used frequently and easily when discussing supervision. It may be used also as a resistance to awareness of transference phenomena within the supervisor in relation to the supervisee, particularly because of its clinical presentation. It is an enactment between supervisor and supervisee, thus ripe with possibilities for disguise, displacement and gratification. While transference reactions of the supervisee are often discussed, those of the supervisor are notably missing in our literature.
Expressing Parallelism with ROOT

Energy Technology Data Exchange (ETDEWEB)

Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

2017-11-22

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT

Science.gov (United States)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

CERN Document Server

Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

2015-01-01

AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

CERN Document Server

Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

2015-01-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...
Information Technology Process Improvement Decision-Making: An Exploratory Study from the Perspective of Process Owners and Process Managers

Science.gov (United States)

Lamp, Sandra A.

2012-01-01

There is information available in the literature that discusses information technology (IT) governance and investment decision making from an executive-level perception, yet there is little information available that offers the perspective of process owners and process managers pertaining to their role in IT process improvement and investment…
Parallel k-means++

Energy Technology Data Exchange (ETDEWEB)

2017-04-04

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

Energy Technology Data Exchange (ETDEWEB)

Cavanagh, J.; Cui, S.

2009-01-01

Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.
Passive and partially active fault tolerance for massively parallel stream processing engines

DEFF Research Database (Denmark)

Su, Li; Zhou, Yongluan

2018-01-01

. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...... also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness...
The Perspective Structure of Visual Space

Science.gov (United States)

2015-01-01

Luneburg’s model has been the reference for experimental studies of visual space for almost seventy years. His claim for a curved visual space has been a source of inspiration for visual scientists as well as philosophers. The conclusion of many experimental studies has been that Luneburg’s model does not describe visual space in various tasks and conditions. Remarkably, no alternative model has been suggested. The current study explores perspective transformations of Euclidean space as a model for visual space. Computations show that the geometry of perspective spaces is considerably different from that of Euclidean space. Collinearity but not parallelism is preserved in perspective space and angles are not invariant under translation and rotation. Similar relationships have shown to be properties of visual space. Alley experiments performed early in the nineteenth century have been instrumental in hypothesizing curved visual spaces. Alleys were computed in perspective space and compared with reconstructed alleys of Blumenfeld. Parallel alleys were accurately described by perspective geometry. Accurate distance alleys were derived from parallel alleys by adjusting the interstimulus distances according to the size-distance invariance hypothesis. Agreement between computed and experimental alleys and accommodation of experimental results that rejected Luneburg’s model show that perspective space is an appropriate model for how we perceive orientations and angles. The model is also appropriate for perceived distance ratios between stimuli but fails to predict perceived distances. PMID:27648222

Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex

Science.gov (United States)

Lafer-Sousa, Rosa; Conway, Bevil R.

2014-01-01

Visual-object processing culminates in inferior temporal (IT) cortex. To assess the organization of IT, we measured fMRI responses in alert monkey to achromatic images (faces, fruit, bodies, places) and colored gratings. IT contained multiple color-biased regions, which were typically ventral to face patches and, remarkably, yoked to them, spaced regularly at four locations predicted by known anatomy. Color and face selectivity increased for more anterior regions, indicative of a broad hierarchical arrangement. Responses to non-face shapes were found across IT, but were stronger outside color-biased regions and face patches, consistent with multiple parallel streams. IT also contained multiple coarse eccentricity maps: face patches overlapped central representations; color-biased regions spanned mid-peripheral representations; and place-biased regions overlapped peripheral representations. These results suggest that IT comprises parallel, multi-stage processing networks subject to one organizing principle. PMID:24141314
Lamb wave propagation modelling and simulation using parallel processing architecture and graphical cards

International Nuclear Information System (INIS)

Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K

2012-01-01

This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)
Bessel functions: parallel display and processing.

Science.gov (United States)

Lohmann, A W; Ojeda-Castañeda, J; Serrano-Heredia, A

1994-01-01

We present an optical setup that converts planar binary curves into two-dimensional amplitude distributions, which are proportional, along one axis, to the Bessel function of order n, whereas along the other axis the order n increases. This Bessel displayer can be used for parallel Bessel transformation of a signal. Experimental verifications are included.
Computer-Aided Parallelizer and Optimizer

Science.gov (United States)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Discovering and Exploring State-based Models for Multi-perspective Processes

NARCIS (Netherlands)

van Eck, M.L.; Sidorova, N.; van der Aalst, W.M.P.; La Rosa, M.; Loos, P.; Pastor, O.

2016-01-01

Process mining provides fact-based insights into process behaviour captured in event data. In this work we aim to discover models for processes where different facets, or perspectives, of the process can be identified. Instead of focussing on the events or activities that are executed in the context
The Temporal Dynamics of Visual Search: Evidence for Parallel Processing in Feature and Conjunction Searches

Science.gov (United States)

McElree, Brian; Carrasco, Marisa

2012-01-01

Feature and conjunction searches have been argued to delineate parallel and serial operations in visual processing. The authors evaluated this claim by examining the temporal dynamics of the detection of features and conjunctions. The 1st experiment used a reaction time (RT) task to replicate standard mean RT patterns and to examine the shapes of the RT distributions. The 2nd experiment used the response-signal speed–accuracy trade-off (SAT) procedure to measure discrimination (asymptotic detection accuracy) and detection speed (processing dynamics). Set size affected discrimination in both feature and conjunction searches but affected detection speed only in the latter. Fits of models to the SAT data that included a serial component overpredicted the magnitude of the observed dynamics differences. The authors concluded that both features and conjunctions are detected in parallel. Implications for the role of attention in visual processing are discussed. PMID:10641310
Parallel programming practical aspects, models and current limitations

CERN Document Server

Tarkov, Mikhail S

2014-01-01

Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...
The parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture

International Nuclear Information System (INIS)

Wang Shi; Kang Kejun; Wang Jingjin

1996-01-01

Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture, hardware and software design of PIRS-4 (4-processor Parallel Image Reconstruction System), which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes the structure and components of the system, the design of crossbar switch and details of control model, the description of RPBP image reconstruction, the choice of OS (Operate System) and language, the principle of imitating EMS, direct memory R/W of float and programming in the protect model. Finally, the test results are given
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Application of the parallel processing computer to a nuclear disaster prevention support system

Energy Technology Data Exchange (ETDEWEB)

Shigehiro, Nukatsuka; Osami, Watanabe [Mitsubishi Heavy Industries, LTD (Japan)

2003-07-01

At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)
Application of the parallel processing computer to a nuclear disaster prevention support system

International Nuclear Information System (INIS)

Shigehiro, Nukatsuka; Osami, Watanabe

2003-01-01

At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)
The application of image processing in the measurement for three-light-axis parallelity of laser ranger

Science.gov (United States)

Wang, Yang; Wang, Qianqian

2008-12-01

When laser ranger is transported or used in field operations, the transmitting axis, receiving axis and aiming axis may be not parallel. The nonparallelism of the three-light-axis will affect the range-measuring ability or make laser ranger not be operated exactly. So testing and adjusting the three-light-axis parallelity in the production and maintenance of laser ranger is important to ensure using laser ranger reliably. The paper proposes a new measurement method using digital image processing based on the comparison of some common measurement methods for the three-light-axis parallelity. It uses large aperture off-axis paraboloid reflector to get the images of laser spot and white light cross line, and then process the images on LabVIEW platform. The center of white light cross line can be achieved by the matching arithmetic in LABVIEW DLL. And the center of laser spot can be achieved by gradation transformation, binarization and area filter in turn. The software system can set CCD, detect the off-axis paraboloid reflector, measure the parallelity of transmitting axis and aiming axis and control the attenuation device. The hardware system selects SAA7111A, a programmable vedio decoding chip, to perform A/D conversion. FIFO (first-in first-out) is selected as buffer.USB bus is used to transmit data to PC. The three-light-axis parallelity can be achieved according to the position bias between them. The device based on this method has been already used. The application proves this method has high precision, speediness and automatization.
A Case Study of a Hybrid Parallel 3D Surface Rendering Graphics Architecture

DEFF Research Database (Denmark)

Holten-Lund, Hans Erik; Madsen, Jan; Pedersen, Steen

1997-01-01

This paper presents a case study in the design strategy used inbuilding a graphics computer, for drawing very complex 3Dgeometric surfaces. The goal is to build a PC based computer systemcapable of handling surfaces built from about 2 million triangles, andto be able to render a perspective view...... of these on a computer displayat interactive frame rates, i.e. processing around 50 milliontriangles per second. The paper presents a hardware/softwarearchitecture called HPGA (Hybrid Parallel Graphics Architecture) whichis likely to be able to carry out this task. The case study focuses ontechniques to increase...
Seamless-merging-oriented parallel inverse lithography technology

International Nuclear Information System (INIS)

Yang Yiwei; Shi Zheng; Shen Shanhu

2009-01-01

Inverse lithography technology (ILT), a promising resolution enhancement technology (RET) used in next generations of IC manufacture, has the capability to push lithography to its limit. However, the existing methods of ILT are either time-consuming due to the large layout in a single process, or not accurate enough due to simply block merging in the parallel process. The seamless-merging-oriented parallel ILT method proposed in this paper is fast because of the parallel process; and most importantly, convergence enhancement penalty terms (CEPT) introduced in the parallel ILT optimization process take the environment into consideration as well as environmental change through target updating. This method increases the similarity of the overlapped area between guard-bands and work units, makes the merging process approach seamless and hence reduces hot-spots. The experimental results show that seamless-merging-oriented parallel ILT not only accelerates the optimization process, but also significantly improves the quality of ILT.
Vector-Parallel processing of the successive overrelaxation method

International Nuclear Information System (INIS)

Yokokawa, Mitsuo

1988-02-01

Successive overrelaxation method, called SOR method, is one of iterative methods for solving linear system of equations, and it has been calculated in serial with a natural ordering in many nuclear codes. After the appearance of vector processors, this natural SOR method has been changed for the parallel algorithm such as hyperplane or red-black method, in which the calculation order is modified. These methods are suitable for vector processors, and more high-speed calculation can be obtained compared with the natural SOR method on vector processors. In this report, a new scheme named 4-colors SOR method is proposed. We find that the 4-colors SOR method can be executed on vector-parallel processors and it gives the most high-speed calculation among all SOR methods according to results of the vector-parallel execution on the Alliant FX/8 multiprocessor system. It is also shown that the theoretical optimal acceleration parameters are equal among five different ordering SOR methods, and the difference between convergence rates of these SOR methods are examined. (author)
A parallel buffer tree

DEFF Research Database (Denmark)

Sitchinava, Nodar; Zeh, Norbert

2012-01-01

We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives

International Nuclear Information System (INIS)

Ullmann, P.

2007-06-01

The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission
Enacting outsourcing: An innovation process perspective

Directory of Open Access Journals (Sweden)

Benedikte Borgström

2010-11-01

Full Text Available Much research has been done on outsourcing. However, we still know little about outsourcing based on a social process view on innovation. Outsourcing is an innovation as a logistics-related process that is perceived as new by the adopter. We will explore and analyse an outsourcing idea, its development and implementation, from an innovation perspective. The development is studied in a qualitative, long-term process study. The analysis draws on Hoholm’s (2009 model of innovation processes. It extends the applicability of this innovation model and its methodology of following the action into logistics and supply chain management (SCM research. The outsourcing process is an innovation that develops in a rational way based on the incremental process because of its actor-network and simultaneous reflection. Interactions and confrontations come about because of involved contrary forces such as competing objectives. The model explains the practice of SCM innovation and increases the understanding of dynamics and complexity. The process study brings insights to cause-effect relations in the development of outsourcing that are consequential to innovative logistics and SCM.
A Parallel Processing Algorithm for Remote Sensing Classification

Science.gov (United States)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
New Product Development (Npd) Process In Subsidiary: Information Perspectives

OpenAIRE

Firmanzah

2008-01-01

Information is an important resource for new product development (NPD) process in subsidiary. However, we still lack of research to analyze NPD process from information perspective in subsidiary context. This research is an exploratory research and it exploited 8 cases of NPD process in consumer goods subsidiaries operating in Indonesian market. Three types of information have been identified and analyzed NPD process; global, regional and local information. The result of this research ...

Building an Elastic Parallel OGC Web Processing Service on a Cloud-Based Cluster: A Case Study of Remote Sensing Data Processing Service

Directory of Open Access Journals (Sweden)

Xicheng Tan

2015-10-01

Full Text Available Since the Open Geospatial Consortium (OGC proposed the geospatial Web Processing Service (WPS, standard OGC Web Service (OWS-based geospatial processing has become the major type of distributed geospatial application. However, improving the performance and sustainability of the distributed geospatial applications has become the dominant challenge for OWSs. This paper presents the construction of an elastic parallel OGC WPS service on a cloud-based cluster and the designs of a high-performance, cloud-based WPS service architecture, the scalability scheme of the cloud, and the algorithm of the elastic parallel geoprocessing. Experiments of the remote sensing data processing service demonstrate that our proposed method can provide a higher-performance WPS service that uses less computing resources. Our proposed method can also help institutions reduce hardware costs, raise the rate of hardware usage, and conserve energy, which is important in building green and sustainable geospatial services or applications.
SPINning parallel systems software

International Nuclear Information System (INIS)

Matlin, O.S.; Lusk, E.; McCune, W.

2002-01-01

We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin
Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

Science.gov (United States)

Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

2011-04-01

To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
Parallel and vector implementation of APROS simulator code

International Nuclear Information System (INIS)

Niemi, J.; Tommiska, J.

1990-01-01

In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)
Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

Directory of Open Access Journals (Sweden)

Wei Shu

1994-01-01

Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
Distributed Parallel Architecture for "Big Data"

Directory of Open Access Journals (Sweden)

Catalin BOJA

2012-01-01

Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.
Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

NARCIS (Netherlands)

A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

2016-01-01

textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

Science.gov (United States)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Real-time data acquisition and parallel data processing solution for TJ-II Bolometer arrays diagnostic

Energy Technology Data Exchange (ETDEWEB)

Barrera, E. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain)]. E-mail: eduardo.barrera@upm.es; Ruiz, M. [Grupo de Investigacion en Instrumentacion y Acustica Aplicada, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Lopez, S. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Machon, D. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain); Ochando, M. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain)

2006-07-15

Maps of local plasma emissivity of TJ-II plasmas are determined using three-array cameras of silicon photodiodes (AXUV type from IRD). They have assigned the top and side ports of the same sector of the vacuum vessel. Each array consists of 20 unfiltered detectors. The signals from each of these detectors are the inputs to an iterative algorithm of tomographic reconstruction. Currently, these signals are acquired by a PXI standard system at approximately 50 kS/s, with 12 bits of resolution and are stored for off-line processing. A 0.5 s discharge generates 3 Mbytes of raw data. The algorithm's load exceeds the CPU capacity of the PXI system's controller in a continuous mode, making unfeasible to process the samples in parallel with their acquisition in a PXI standard system. A new architecture model has been developed, making possible to add one or several processing cards to a standard PXI system. With this model, it is possible to define how to distribute, in real-time, the data from all acquired signals in the system among the processing cards and the PXI controller. This way, by distributing the data processing among the system controller and two processing cards, the data processing can be done in parallel with the acquisition. Hence, this system configuration would be able to measure even in long pulse devices.
An intelligent allocation algorithm for parallel processing

Science.gov (United States)

Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

1988-01-01

The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.
High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

International Nuclear Information System (INIS)

Tsuji, Masashi; Chiba, Gou

2000-01-01

A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)
Massively parallel mathematical sieves

Energy Technology Data Exchange (ETDEWEB)

Montry, G.R.

1989-01-01

The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Extending the POSIX I/O interface: a parallel file system perspective.

Energy Technology Data Exchange (ETDEWEB)

Vilayannur, M.; Lang, S.; Ross, R.; Klundt, R.; Ward, L.; Mathematics and Computer Science; VMWare, Inc.; SNL

2008-12-11

The POSIX interface does not lend itself well to enabling good performance for high-end applications. Extensions are needed in the POSIX I/O interface so that high-concurrency HPC applications running on top of parallel file systems perform well. This paper presents the rationale, design, and evaluation of a reference implementation of a subset of the POSIX I/O interfaces on a widely used parallel file system (PVFS) on clusters. Experimental results on a set of micro-benchmarks confirm that the extensions to the POSIX interface greatly improve scalability and performance.
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

Directory of Open Access Journals (Sweden)

M. K. Bouza

2017-01-01

Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.
A Networked Perspective on the Engineering Design Process: At the Intersection of Process and Organisation Architectures

DEFF Research Database (Denmark)

Parraguez, Pedro

projects often fail to be on time, on budget, and meeting specifications. Despite the wealth of process models available, previous approaches have been insufficient to provide a networked perspective that allows the challenging combination of organisational and process complexity to unfold. The lack...
Information-Limited Parallel Processing in Difficult Heterogeneous Covert Visual Search

Science.gov (United States)

Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

2010-01-01

Difficult visual search is often attributed to time-limited serial attention operations, although neural computations in the early visual system are parallel. Using probabilistic search models (Dosher, Han, & Lu, 2004) and a full time-course analysis of the dynamics of covert visual search, we distinguish unlimited capacity parallel versus serial…
Sustainability Attitudes and Behavioral Motivations of College Students: Testing the Extended Parallel Process Model

Science.gov (United States)

Perrault, Evan K.; Clark, Scott K.

2018-01-01

Purpose: A planet that can no longer sustain life is a frightening thought--and one that is often present in mass media messages. Therefore, this study aims to test the components of a classic fear appeal theory, the extended parallel process model (EPPM) and to determine how well its constructs predict sustainability behavioral intentions. This…

Leveraging human oversight and intervention in large-scale parallel processing of open-source data

Science.gov (United States)

Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

2015-05-01

The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

Energy Technology Data Exchange (ETDEWEB)

Koniges, A.

1996-02-09

This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.
High-Performance Parallel and Stream Processing of X-ray Microdiffraction Data on Multicores

International Nuclear Information System (INIS)

Bauer, Michael A; McIntyre, Stewart; Xie Yuzhen; Biem, Alain; Tamura, Nobumichi

2012-01-01

We present the design and implementation of a high-performance system for processing synchrotron X-ray microdiffraction (XRD) data in IBM InfoSphere Streams on multicore processors. We report on the parallel and stream processing techniques that we use to harvest the power of clusters of multicores to analyze hundreds of gigabytes of synchrotron XRD data in order to reveal the microtexture of polycrystalline materials. The timing to process one XRD image using one pipeline is about ten times faster than the best C program at present. With the support of InfoSphere Streams platform, our software is able to be scaled up to operate on clusters of multi-cores for processing multiple images concurrently. This system provides a high-performance processing kernel to achieve near real-time data analysis of image data from synchrotron experiments.
NEW PERSPECTIVES ON STRATEGIC MANAGEMENT PROCESS

Directory of Open Access Journals (Sweden)

POP Zenovia Cristiana

2013-07-01

Full Text Available For developing economies the development of enterprises should be a strategic goal, this way of thinking may become viable only as a result of a combination of judicious analysis based on specific local economic aspects and a set of actions to correct any slippage or amplify existing development trends taken by the managers. A better leadership would unequivocally lead to a better strategy but sometimes the lack of information, first about the external environment, continuously undergoing quick and radical changes, the political problems and the complexity of the implementation of the strategy or the costs that it implies are not taken into consideration. Therefore managers have two options: to establish strategies, which would lead to the achievement of the objectives; evaluate them on the basis of economic efficiency or to identify an already existent strategy and to adapt it to the environment changes in which the enterprise carries on its activity. This paper aims at discussing and explaining from a theoretical perspective, the evolution and the advantages and disadvantages of the strategic management process, in order to convey the managers a modality to achieve competitiveness and evaluate the position of the firm. In the first section, we explain the the necessity of strategic management process. In the second section we present the different evolution stages. The third section presents our conclusions regarding the advantages and disadvantages of the strategic management process, fundamental for the strategy success. The financial crisis did affect the Romanian economy and Romanian enterprises early in 2009 registering an impact of the crisis identified in the need of the managers to rethink their strategies, to improve their management skills and perspectives on the role of the employees after the crisis. In this paper we try to underline the evolution stages of the strategic management process with its own characteristics by which both
High performance parallel I/O

CERN Document Server

Prabhat

2014-01-01

Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Mobile Devices and GPU Parallelism in Ionospheric Data Processing

Science.gov (United States)

Mascharka, D.; Pankratius, V.

2015-12-01

Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of
Simulation and Analysis of Complex Biological Processes: an Organisation Modelling Perspective

NARCIS (Netherlands)

Bosse, T.; Jonker, C.M.; Treur, J.

2005-01-01

This paper explores how the dynamics of complex biological processes can be modelled and simulated as an organisation of multiple agents. This modelling perspective identifies organisational structure occurring in complex decentralised processes and handles complexity of the analysis of the dynamics
A tomograph VMEbus parallel processing data acquisition system

International Nuclear Information System (INIS)

Atkins, M.S.; Wilkinson, N.A.; Rogers, J.G.

1988-11-01

This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. The data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition systems. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events per second for raw events which are 64 bits wide. Real-time data acquisition and pre-processing requirements can be met by about forty 20 MHz Motorola 68020/68881 processors
Initial perspectives on process threat management

International Nuclear Information System (INIS)

Whiteley, James R. Rob; Mannan, M. Sam

2004-01-01

Terrorist and criminal acts are now considered credible risks in the process industries. Deliberate attacks on the nation's petroleum refineries and chemical plants would pose a significant threat to public welfare, national security, and the US economy. To-date, the primary response of government and industry has been on improved security to prevent attacks and the associated consequences. While prevention is clearly preferred, the potential for successful attacks must be addressed. If plant security is breached, the extent of the inflicted damage is determined by the available plant safety systems and procedures. We refer to this 'inside the gate' response as process threat management. The authors have initiated a joint industry/academia study to address: - the level of safety provided by existing plant equipment and safety systems in response to a terrorist act, and; - identification of process (rather than security) needs or opportunities to address this new safety concern. This paper describes the initial perspectives and issues identified by the team at the beginning of the study
Belief–logic conflict resolution in syllogistic reasoning: Inspection-time evidence for a parallel process model

OpenAIRE

Stupple, Edward J.N; Ball, Linden

2008-01-01

An experiment is reported examining dual-process models of belief bias in syllogistic reasoning using a problem complexity manipulation and an inspection-time method to monitor processing latencies for premises and conclusions. Endorsement rates indicated increased belief bias on complex problems, a finding that runs counter to the “belief-first” selective scrutiny model, but which is consistent with other theories, including “reasoning-first” and “parallel-process” models. Inspection-time da...
History Matching in Parallel Computational Environments

Energy Technology Data Exchange (ETDEWEB)

Steven Bryant; Sanjay Srinivasan; Alvaro Barrera; Sharad Yadav

2004-08-31

In the probabilistic approach for history matching, the information from the dynamic data is merged with the prior geologic information in order to generate permeability models consistent with the observed dynamic data as well as the prior geology. The relationship between dynamic response data and reservoir attributes may vary in different regions of the reservoir due to spatial variations in reservoir attributes, fluid properties, well configuration, flow constrains on wells etc. This implies probabilistic approach should then update different regions of the reservoir in different ways. This necessitates delineation of multiple reservoir domains in order to increase the accuracy of the approach. The research focuses on a probabilistic approach to integrate dynamic data that ensures consistency between reservoir models developed from one stage to the next. The algorithm relies on efficient parameterization of the dynamic data integration problem and permits rapid assessment of the updated reservoir model at each stage. The report also outlines various domain decomposition schemes from the perspective of increasing the accuracy of probabilistic approach of history matching. Research progress in three important areas of the project are discussed: {lg_bullet}Validation and testing the probabilistic approach to incorporating production data in reservoir models. {lg_bullet}Development of a robust scheme for identifying reservoir regions that will result in a more robust parameterization of the history matching process. {lg_bullet}Testing commercial simulators for parallel capability and development of a parallel algorithm for history matching.
Reward processing and mood-related symptoms: An RDoC and translational neuroscience perspective.

Science.gov (United States)

Nusslock, Robin; Alloy, Lauren B

2017-07-01

Two objectives of the NIMH Research Domain Criteria (RDoC) initiative are to identify (a) mechanisms that are common to multiple psychiatric disorders, and (b) mechanisms that are unique to specific psychiatric symptoms, and that reflect markers of differential risk for these symptoms. With respect to these objectives, a brain-behavior dimension that has received considerable attention and that is directly relevant to the Positive Valence Systems domain of the RDoC initiative involves reward processing. The present review paper first examines the relationship between reward processing and mood-related symptoms from an RDoC perspective. We then place this work in a larger context by examining the relationship between reward processing abnormalities and psychiatric symptoms defined broadly, including mood-related symptoms, schizophrenia, and addiction. Our review suggests that reward hyposensitivity relates to a subtype of anhedonia characterized by motivational deficits in unipolar depression, and reward hypersensitivity relates to a cluster of hypo/manic symptoms characterized by excessive approach motivation in the context of bipolar disorder. Integrating this perspective with research on reward processing abnormalities in schizophrenia and addiction, we further argue that the principles of equifinality and multifinality may be preferable to a transdiagnostic perspective for conceptualizing the relationship between reward processing and psychiatric symptoms defined broadly. We propose that vulnerability to either motivational anhedonia or approach-related hypo/manic symptoms involve extreme and opposite profiles of reward processing. We further propose that an equifinality and multifinality perspective may serve as a useful framework for future research on reward processing abnormalities and psychiatric symptoms. Copyright © 2017. Published by Elsevier B.V.
When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias.

Science.gov (United States)

Trippas, Dries; Thompson, Valerie A; Handley, Simon J

2017-05-01

Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was valid on half the trials or to decide whether the conclusion was believable on the other half. When belief and logic conflict, the default-interventionist view predicts that it should take less time to respond on the basis of belief than logic, and that the believability of a conclusion should interfere with judgments of validity, but not the reverse. The parallel-processing view predicts that beliefs should interfere with logic judgments only if the processing required to evaluate the logical structure exceeds that required to evaluate the knowledge necessary to make a belief-based judgment, and vice versa otherwise. Consistent with this latter view, for the simplest reasoning problems (modus ponens), judgments of belief resulted in lower accuracy than judgments of validity, and believability interfered more with judgments of validity than the converse. For problems of moderate complexity (modus tollens and single-model syllogisms), the interference was symmetrical, in that validity interfered with belief judgments to the same degree that believability interfered with validity judgments. For the most complex (three-term multiple-model syllogisms), conclusion believability interfered more with judgments of validity than vice versa, in spite of the significant interference from conclusion validity on judgments of belief.
Parallel Distributed Processing Theory in the Age of Deep Networks.

Science.gov (United States)

Bowers, Jeffrey S

2017-12-01

Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory. Copyright © 2017. Published by Elsevier Ltd.
Impact of chronotype and time perspective on the processing of scripts.

Science.gov (United States)

Nowack, Kati; van der Meer, Elke

2014-05-01

Little is known about the impact of temporal orientation (chronotype; time perspective) on cognitive performance. This study adopted a psychophysiological approach to explore how chronotype (morningness-eveningness) and time perspective (present; future) influence time succession as another aspect of psychological time that is entailed within script knowledge. In a temporal judgment task, participants decided which of the two presented sub-events (e.g., get new batteries-set right time on alarm clock) comes earlier (or later) within a given script (e.g., changing batteries in an alarm clock). Behavioral and pupillary data suggest a differential impact of chronotype and time perspective on script knowledge and cognitive performance. The impact of time perspective on the processing of temporal information entailed in script knowledge appears linked to match or mismatch conditions between a strong focus on future outcomes associated with future time perspective and the task of identifying either the later (future-oriented) or the earlier (past-oriented) sub-event. Concerning the chronotype, evening types process items in which chronological time succession is violated (i.e., reversely presented items) more accurately than morning types. Indexed by pupillary data, the impact of chronotype may relate to more general cognitive abilities. The psychophysiological data derived in this study suggests that evening types typically outperform morning types in various measures such working memory capacity and verbal intelligence simply because they invest more cognitive resources than morning types. Copyright © 2014 Elsevier B.V. All rights reserved.
Parallelization of ultrasonic field simulations for non destructive testing

International Nuclear Information System (INIS)

Lambert, Jason

2015-01-01

The Non Destructive Testing field increasingly uses simulation. It is used at every step of the whole control process of an industrial part, from speeding up control development to helping experts understand results. During this thesis, a fast ultrasonic field simulation tool dedicated to the computation of an ultrasonic field radiated by a phase array probe in an isotropic specimen has been developed. During this thesis, a simulation tool dedicated to the fast computation of an ultrasonic field radiated by a phased array probe in an isotropic specimen has been developed. Its performance enables an interactive usage. To benefit from the commonly available parallel architectures, a regular model (aimed at removing divergent branching) derived from the generic CIVA model has been developed. First, a reference implementation was developed to validate this model against CIVA results, and to analyze its performance behaviour before optimization. The resulting code has been optimized for three kinds of parallel architectures commonly available in workstations: general purpose processors (GPP), many-core co-processors (Intel MIC) and graphics processing units (nVidia GPU). On the GPP and the MIC, the algorithm was reorganized and implemented to benefit from both parallelism levels, multithreading and vector instructions. On the GPU, the multiple steps of field computing have been divided in multiple successive CUDA kernels. Moreover, libraries dedicated to each architecture were used to speedup Fast Fourier Transforms, Intel MKL on GPP and MIC and nVidia cuFFT on GPU. Performance and hardware adequation of the produced codes were thoroughly studied for each architecture. On multiple realistic control configurations, interactive performance was reached. Perspectives to address more complex configurations were drawn. Finally, the integration and the industrialization of this code in the commercial NDT platform CIVA is discussed. (author) [fr
Instrument Variables for Reducing Noise in Parallel MRI Reconstruction

Directory of Open Access Journals (Sweden)

Yuchou Chang

2017-01-01

Full Text Available Generalized autocalibrating partially parallel acquisition (GRAPPA has been a widely used parallel MRI technique. However, noise deteriorates the reconstructed image when reduction factor increases or even at low reduction factor for some noisy datasets. Noise, initially generated from scanner, propagates noise-related errors during fitting and interpolation procedures of GRAPPA to distort the final reconstructed image quality. The basic idea we proposed to improve GRAPPA is to remove noise from a system identification perspective. In this paper, we first analyze the GRAPPA noise problem from a noisy input-output system perspective; then, a new framework based on errors-in-variables (EIV model is developed for analyzing noise generation mechanism in GRAPPA and designing a concrete method—instrument variables (IV GRAPPA to remove noise. The proposed EIV framework provides possibilities that noiseless GRAPPA reconstruction could be achieved by existing methods that solve EIV problem other than IV method. Experimental results show that the proposed reconstruction algorithm can better remove the noise compared to the conventional GRAPPA, as validated with both of phantom and in vivo brain data.
Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives; Parallele Sendetechniken in der Magnetresonanztomographie: experimentelle Realisierung, Anwendungen und Perspektiven

Energy Technology Data Exchange (ETDEWEB)

Ullmann, P.

2007-06-15

The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission
A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale LNG (liquefied natural gas) plant in skid-mount packages

International Nuclear Information System (INIS)

He, Tianbiao; Ju, Yonglin

2014-01-01

The utilization of unconventional natural gas is still a great challenge for China due to its distribution locations and small reserves. Thus, liquefying the unconventional natural gas by using small-scale LNG plant in skid-mount packages is a good choice with great economic benefits. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale plant in skid-mount packages has been proposed. It first designs a process configuration. Then, thermodynamic analysis of the process is conducted. Next, an optimization model with genetic algorithm method is developed to optimize the process. Finally, the flexibilities of the process are tested by two different feed gases. In conclusion, the proposed parallel nitrogen expansion liquefaction process can be used in small-scale LNG plant in skid-mount packages with high exergy efficiency and great economic benefits. - Highlights: • A novel design of parallel nitrogen expansion liquefaction process is proposed. • Genetic algorithm is applied to optimize the novel process. • The unit energy consumption of optimized process is 0.5163 kWh/Nm 3 . • The exergy efficiency of the optimized case is 0.3683. • The novel process has a good flexibility for different feed gas conditions
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.

The vector and parallel processing of MORSE code on Monte Carlo Machine

International Nuclear Information System (INIS)

Hasegawa, Yukihiro; Higuchi, Kenji.

1995-11-01

Multi-group Monte Carlo Code for particle transport, MORSE is modified for high performance computing on Monte Carlo Machine Monte-4. The method and the results are described. Monte-4 was specially developed to realize high performance computing of Monte Carlo codes for particle transport, which have been difficult to obtain high performance in vector processing on conventional vector processors. Monte-4 has four vector processor units with the special hardware called Monte Carlo pipelines. The vectorization and parallelization of MORSE code and the performance evaluation on Monte-4 are described. (author)
Is orthographic information from multiple parafoveal words processed in parallel: An eye-tracking study.

Science.gov (United States)

Cutter, Michael G; Drieghe, Denis; Liversedge, Simon P

2017-08-01

In the current study we investigated whether orthographic information available from 1 upcoming parafoveal word influences the processing of another parafoveal word. Across 2 experiments we used the boundary paradigm (Rayner, 1975) to present participants with an identity preview of the 2 words after the boundary (e.g., hot pan ), a preview in which 2 letters were transposed between these words (e.g., hop tan ), or a preview in which the same 2 letters were substituted (e.g., hob fan ). We hypothesized that if these 2 words were processed in parallel in the parafovea then we may observe significant preview benefits for the condition in which the letters were transposed between words relative to the condition in which the letters were substituted. However, no such effect was observed, with participants fixating the words for the same amount of time in both conditions. This was the case both when the transposition was made between the final and first letter of the 2 words (e.g., hop tan as a preview of hot pan ; Experiment 1) and when the transposition maintained within word letter position (e.g., pit hop as a preview of hit pop ; Experiment 2). The implications of these findings are considered in relation to serial and parallel lexical processing during reading. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A high-speed linear algebra library with automatic parallelism

Science.gov (United States)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Haptic teleoperation systems signal processing perspective

CERN Document Server

Lee, Jae-young

2015-01-01

This book examines the signal processing perspective in haptic teleoperation systems. This text covers the topics of prediction, estimation, architecture, data compression, and error correction that can be applied to haptic teleoperation systems. The authors begin with an overview of haptic teleoperation systems, then look at a Bayesian approach to haptic teleoperation systems. They move onto a discussion of haptic data compression, haptic data digitization and forward error correction. · Presents haptic data prediction/estimation methods that compensate for unreliable networks · Discusses haptic data compression that reduces haptic data size over limited network bandwidth and haptic data error correction that compensate for packet loss problem · Provides signal processing techniques used with existing control architectures.
A further extension of the Extended Parallel Process Model (E-EPPM): implications of cognitive appraisal theory of emotion and dispositional coping style.

Science.gov (United States)

So, Jiyeon

2013-01-01

For two decades, the extended parallel process model (EPPM; Witte, 1992 ) has been one of the most widely used theoretical frameworks in health risk communication. The model has gained much popularity because it recognizes that, ironically, preceding fear appeal models do not incorporate the concept of fear as a legitimate and central part of them. As a remedy to this situation, the EPPM aims at "putting the fear back into fear appeals" ( Witte, 1992 , p. 330). Despite this attempt, however, this article argues that the EPPM still does not fully capture the essence of fear as an emotion. Specifically, drawing upon Lazarus's (1991 ) cognitive appraisal theory of emotion and the concept of dispositional coping style ( Miller, 1995 ), this article seeks to further extend the EPPM. The revised EPPM incorporates a more comprehensive perspective on risk perceptions as a construct involving both cognitive and affective aspects (i.e., fear and anxiety) and integrates the concept of monitoring and blunting coping style as a moderator of further information seeking regarding a given risk topic.
Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

Science.gov (United States)

Qin, Cheng-Zhi; Zhan, Lijun

2012-06-01

As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU
Category specific spatial dissociations of parallel processes underlying visual naming.

Science.gov (United States)

Conner, Christopher R; Chen, Gang; Pieters, Thomas A; Tandon, Nitin

2014-10-01

The constituent elements and dynamics of the networks responsible for word production are a central issue to understanding human language. Of particular interest is their dependency on lexical category, particularly the possible segregation of nouns and verbs into separate processing streams. We applied a novel mixed-effects, multilevel analysis to electrocorticographic data collected from 19 patients (1942 electrodes) to examine the activity of broadly disseminated cortical networks during the retrieval of distinct lexical categories. This approach was designed to overcome the issues of sparse sampling and individual variability inherent to invasive electrophysiology. Both noun and verb generation evoked overlapping, yet distinct nonhierarchical processes favoring ventral and dorsal visual streams, respectively. Notable differences in activity patterns were noted in Broca's area and superior lateral temporo-occipital regions (verb > noun) and in parahippocampal and fusiform cortices (noun > verb). Comparisons with functional magnetic resonance imaging (fMRI) results yielded a strong correlation of blood oxygen level-dependent signal and gamma power and an independent estimate of group size needed for fMRI studies of cognition. Our findings imply parallel, lexical category-specific processes and reconcile discrepancies between lesional and functional imaging studies. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An Information Processing Perspective on Divergence and Convergence in Collaborative Learning

Science.gov (United States)

Jorczak, Robert L.

2011-01-01

This paper presents a model of collaborative learning that takes an information processing perspective of learning by social interaction. The collaborative information processing model provides a theoretical basis for understanding learning principles associated with social interaction and explains why peer-to-peer discussion is potentially more…
Synchronization Of Parallel Discrete Event Simulations

Science.gov (United States)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit

Directory of Open Access Journals (Sweden)

T. Stefański

2014-12-01

Full Text Available Parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD with the use of time-domain integral equation methods. The developed implementation can be applied to simulations of antenna characteristics. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of parallel DGF-FDTD. The efficiency of parallel computations was investigated as a function of the number of current elements in the FDTD grid. Although the developed method does not apply the fast Fourier transform for convolution computations, advantages stemming from the application of DGF-FDTD instead of FDTD can be demonstrated for one-dimensional wire antennas when simulation results are post-processed by the near-to-far-field transformation.
Parallel coupling of symmetric and asymmetric exclusion processes

International Nuclear Information System (INIS)

Tsekouras, K; Kolomeisky, A B

2008-01-01

A system consisting of two parallel coupled channels where particles in one of them follow the rules of totally asymmetric exclusion processes (TASEP) and in another one move as in symmetric simple exclusion processes (SSEP) is investigated theoretically. Particles interact with each other via hard-core exclusion potential, and in the asymmetric channel they can only hop in one direction, while on the symmetric lattice particles jump in both directions with equal probabilities. Inter-channel transitions are also allowed at every site of both lattices. Stationary state properties of the system are solved exactly in the limit of strong couplings between the channels. It is shown that strong symmetric couplings between totally asymmetric and symmetric channels lead to an effective partially asymmetric simple exclusion process (PASEP) and properties of both channels become almost identical. However, strong asymmetric couplings between symmetric and asymmetric channels yield an effective TASEP with nonzero particle flux in the asymmetric channel and zero flux on the symmetric lattice. For intermediate strength of couplings between the lattices a vertical-cluster mean-field method is developed. This approximate approach treats exactly particle dynamics during the vertical transitions between the channels and it neglects the correlations along the channels. Our calculations show that in all cases there are three stationary phases defined by particle dynamics at entrances, at exits or in the bulk of the system, while phase boundaries depend on the strength and symmetry of couplings between the channels. Extensive Monte Carlo computer simulations strongly support our theoretical predictions. Theoretical calculations and computer simulations predict that inter-channel couplings have a strong effect on stationary properties. It is also argued that our results might be relevant for understanding multi-particle dynamics of motor proteins
Using a Design Science Perspective to Understand a Complex Design-Based Research Process

DEFF Research Database (Denmark)

Bækgaard, Lars

2012-01-01

The purpose of the paper is to demonstrate how a design science perspective can be used to describe and understand a set of related design-based research processes. We describe and analyze a case study in a manner that is inspired by design science. The case study involves the design of modeling......-based research processes. And we argue that a design science perspective may be useful for both researchers and practitioners....... tools and the redesign of an information service in a library. We use a set of guidelines from a design science perspective to organize the description and analysis of the case study. By doing this we demonstrate the usefulness of design science as an analytical tool for understanding related design...
Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

Science.gov (United States)

Dembo, Richard; Sullivan, Christopher

2009-01-01

We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…
"Let's Move" campaign: applying the extended parallel process model.

Science.gov (United States)

Batchelder, Alicia; Matusitz, Jonathan

2014-01-01

This article examines Michelle Obama's health campaign, "Let's Move," through the lens of the extended parallel process model (EPPM). "Let's Move" aims to reduce the childhood obesity epidemic in the United States. Developed by Kim Witte, EPPM rests on the premise that people's attitudes can be changed when fear is exploited as a factor of persuasion. Fear appeals work best (a) when a person feels a concern about the issue or situation, and (b) when he or she believes to have the capability of dealing with that issue or situation. Overall, the analysis found that "Let's Move" is based on past health campaigns that have been successful. An important element of the campaign is the use of fear appeals (as it is postulated by EPPM). For example, part of the campaign's strategies is to explain the severity of the diseases associated with obesity. By looking at the steps of EPPM, readers can also understand the strengths and weaknesses of "Let's Move."
Mozambican Adolescents' Perspectives on the Academic Procrastination Process

Science.gov (United States)

Fulano, Celso; Cunha, Jennifer; Núñez, José Carlos; Pereira, Beatriz; Rosário, Pedro

2018-01-01

The current study explored Mozambican adolescents' perspectives of the process of academic procrastination, focusing on three key aspects: Type of tasks where youth usually procrastinate, antecedents, and the perceived consequences. Twenty-four adolescents from a pool of 300 (11th and 12th graders) reported high levels of procrastination and were…
A Parallel Process Growth Model of Avoidant Personality Disorder Symptoms and Personality Traits

Science.gov (United States)

Wright, Aidan G. C.; Pincus, Aaron L.; Lenzenweger, Mark F.

2012-01-01

Background Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Methods Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, 2006), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. Results AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. Conclusions These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general. PMID:22506627
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

Science.gov (United States)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Current distribution characteristics of superconducting parallel circuits

International Nuclear Information System (INIS)

Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.

1994-01-01

In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process
Leveraging Non-Uniform Resources for Parallel Query Processing

DEFF Research Database (Denmark)

Mayr, Tobias; Bonnet, Philippe; Gehrke, Johannes

2003-01-01

Modular clusters are now composed of non- uniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of exist...
Design and simulation of parallel and distributed architectures for images processing

International Nuclear Information System (INIS)

Pirson, Alain

1990-01-01

The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr

Parallel and distributed processing in two SGBDS: A case study

Directory of Open Access Journals (Sweden)

Francisco Javier Moreno

2017-04-01

Full Text Available Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS, such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. Results: The tests of the SQL query evidenced the performance differences between the two DBMS analyzed. However, more thorough testing and a wider variety of queries are needed. Conclusions: The differences in performance between the two DBMSs analyzed show that when evaluating this aspect, it is necessary to consider the particularities of each DBMS and the degree of parallelism of the queries.
A parallel solution for high resolution histological image analysis.

Science.gov (United States)

Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

2012-10-01

This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Ultrascalable petaflop parallel supercomputer

Science.gov (United States)

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

2010-07-20

A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Scientific programming on massively parallel processor CP-PACS

International Nuclear Information System (INIS)

Boku, Taisuke

1998-01-01

The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Ma, Kwan-Liu [Univ. of California, Davis, CA (United States)

2017-02-01

Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations will prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for
Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

CERN Document Server

Unterkircher, A

2005-01-01

We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.
Parallel and distributed processing in two SGBDS: A case study

OpenAIRE

Francisco Javier Moreno; Nataly Castrillón Charari; Camilo Taborda Zuluaga

2017-01-01

Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS), such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. ...
Parallel photonic information processing at gigabyte per second data rates using transient states

Science.gov (United States)

Brunner, Daniel; Soriano, Miguel C.; Mirasso, Claudio R.; Fischer, Ingo

2013-01-01

The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Portable parallel programming in a Fortran environment

International Nuclear Information System (INIS)

May, E.N.

1989-01-01

Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Parallelization of a spherical Sn transport theory algorithm

International Nuclear Information System (INIS)

Haghighat, A.

1989-01-01

The work described in this paper derives a parallel algorithm for an R-dependent spherical S N transport theory algorithm and studies its performance by testing different sample problems. The S N transport method is one of the most accurate techniques used to solve the linear Boltzmann equation. Several studies have been done on the vectorization of the S N algorithms; however, very few studies have been performed on the parallelization of this algorithm. Weinke and Hommoto have looked at the parallel processing of the different energy groups, and Azmy recently studied the parallel processing of the inner iterations of an X-Y S N nodal transport theory method. Both studies have reported very encouraging results, which have prompted us to look at the parallel processing of an R-dependent S N spherical geometry algorithm. This geometry was chosen because, in spite of its simplicity, it contains the complications of the curvilinear geometries (i.e., redistribution of neutrons over the discretized angular bins)
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

International Nuclear Information System (INIS)

Dunigan, T.H.

1988-01-01

1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

Science.gov (United States)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
A DIALECTICAL PERSPECTIVE OF TRAUMA PROCESSING

Directory of Open Access Journals (Sweden)

Brurit Laub

2014-03-01

Full Text Available This article presents a dialectical perspective, which attempts to elucidate the integrative components of trauma processing in therapy. It is proposed that the inherent movement toward greater integration is an expanding dialectical movement. It is conceived as a spiral resulting from the synergy of two dialectical movements. The horizontal line moves between the opposite aspects of the individual (thesis vs. antithesis toward a synthesis. The vertical line moves upward via whole/part shifts toward greater integration, or downward toward disintegration and fragmentation. It is proposed that the complementary processes of differentiation and linking are the building blocks of the integrative/dialectical movement. Differentiation relates to the separation of parts and linking relates to their connection. The role of differentiation and linking in three basic interacting systems of trauma work is discussed. It is proposed that the dialectical principles are applicable to various therapeutic approaches and clinical vignettes are included to illustrate.
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

International Nuclear Information System (INIS)

Mburu, Joe Mwangi; Hah, Chang Joo Hah

2014-01-01

Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

Energy Technology Data Exchange (ETDEWEB)

Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

2014-05-15

Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.
One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

Science.gov (United States)

Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

2009-01-01

Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…
Algorithms for parallel computers

International Nuclear Information System (INIS)

Churchhouse, R.F.

1985-01-01

Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

Science.gov (United States)

Wilson, J Adam; Williams, Justin C

2009-01-01

The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations.

Directory of Open Access Journals (Sweden)

ThienLuan Ho

Full Text Available Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs. In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively.

The STAPL Parallel Graph Library

KAUST Repository

Harshvardhan,

2013-01-01

This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.
Process quality in the Trade Finance unit from the perspective of corporate banking employees

OpenAIRE

Mikkola, Henri

2013-01-01

This thesis examines the quality of the processes in the Trade Finance unit of Pohjola Bank, from the perspective of the corporate banking employees at Helsinki OP Bank. The Trade Finance unit provides methods of payment for foreign trade. Such services are intended for companies and the perspective investigated in this thesis is that of corporate banking employees. The purpose of this thesis is to define the quality of the processes and to develop solutions for difficulties discovered. The q...
Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

Science.gov (United States)

Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

2018-01-01

The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.
Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash

Directory of Open Access Journals (Sweden)

Mathew G. Pelletier

2008-02-01

Full Text Available One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU as an alternative to thePCÃ¢Â€Â™s traditional use of the central processing unit (CPU. The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit Ã¢Â€ÂœGPUÃ¢Â€Â, for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PCÃ¢Â€Â™s central processing
Parallel computing in genomic research: advances and applications.

Science.gov (United States)

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Parallel algorithms for continuum dynamics

International Nuclear Information System (INIS)

Hicks, D.L.; Liebrock, L.M.

1987-01-01

Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Teaching ethics to engineers: ethical decision making parallels the engineering design process.

Science.gov (United States)

Bero, Bridget; Kuhlman, Alana

2011-09-01

In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.
Parallel processing for a 1-D time-dependent solution to impurity rate equations for fusion plasma simulations

International Nuclear Information System (INIS)

Veerasingam, R.

1990-01-01

In fusion plasmas impurities such as carbon, oxygen or nickel can contaminate the plasma and cause degradation of the performance of a fusion device through radiation. However, impurities can also be used as diagnostics to obtain information about a plasma through spectroscopic experiments which can then be used in plasma modeling and simulations. In the past, serial algorithms have been described for either the time dependent or steady state problem. In this paper, we describe a parallel procedure adopted to solve the time-dependent problem. It can be shown that for the steady state problem a parallel procedure would not be a useful application of parallelization because a few seconds of the Central Processing Unit time on a CRAY-XMP or IBM 3090/600S would suffice to obtain the solution, while this is not the case for the time-dependent problem. In order to study the effects of low Z and high Z impurities on the final state of a plasma, time-dependent solutions are necessary. For purposes of diagnostics and comparisons with experiments, a fast turn around time of the simulations would be advantageous. We have implemented a parallel algorithm on and IBM 3090/600S and tested its performance for a typical set of fusion plasma parameters. 4 refs., 1 tab
A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

International Nuclear Information System (INIS)

Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

2011-01-01

In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)
PSHED: a simplified approach to developing parallel programs

International Nuclear Information System (INIS)

Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

1992-01-01

This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Indonesian EFL Students' Perspective on Writing Process: A Pilot Study

Science.gov (United States)

Hermilinda Abas, Imelda; Aziz, Noor Hashima Abd

2016-01-01

The study was aimed at understanding the EFL Indonesian students' perspective on the writing process. The pilot study involved two male Indonesian postgraduate students in Universiti Utara Malaysia. The Indonesian students were selected based on the following criteria: (1) had enough knowledge in English writing, indicated by the completion of…
Massively parallel signal processing using the graphics processing unit for real-time brain-computer interface feature extraction

Directory of Open Access Journals (Sweden)

J. Adam Wilson

2009-07-01

Full Text Available The clock speeds of modern computer processors have nearly plateaued in the past five years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card (GPU was developed for real-time neural signal processing of a brain-computer interface (BCI. The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter, followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally-intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a CPU-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers

DEFF Research Database (Denmark)

The following topics are dealt with: parallel scientific computing; numerical algorithms; parallel nonnumerical algorithms; cloud computing; evolutionary computing; metaheuristics; applied mathematics; GPU computing; multicore systems; hybrid architectures; hierarchical parallelism; HPC systems......; power monitoring; energy monitoring; and distributed computing....
Parallel computing in genomic research: advances and applications

Directory of Open Access Journals (Sweden)

Ocaña K

2015-11-01

Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing
A parallel process growth model of avoidant personality disorder symptoms and personality traits.

Science.gov (United States)

Wright, Aidan G C; Pincus, Aaron L; Lenzenweger, Mark F

2013-07-01

Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, M. F., 2006, The longitudinal study of personality disorders: History, design considerations, and initial findings. Journal of Personality Disorders, 20, 645-670. doi:10.1521/pedi.2006.20.6.645), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general.
Applications of the parallel computing system using network

International Nuclear Information System (INIS)

Ido, Shunji; Hasebe, Hiroki

1994-01-01

Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)
Rubus: A compiler for seamless and extensible parallelism

Science.gov (United States)

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been
Rubus: A compiler for seamless and extensible parallelism.

Directory of Open Access Journals (Sweden)

Muhammad Adnan

Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84
Mathematical Abstraction: Constructing Concept of Parallel Coordinates

Science.gov (United States)

Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.

2017-09-01

Mathematical abstraction is an important process in teaching and learning mathematics so pre-service mathematics teachers need to understand and experience this process. One of the theoretical-methodological frameworks for studying this process is Abstraction in Context (AiC). Based on this framework, abstraction process comprises of observable epistemic actions, Recognition, Building-With, Construction, and Consolidation called as RBC + C model. This study investigates and analyzes how pre-service mathematics teachers constructed and consolidated concept of Parallel Coordinates in a group discussion. It uses AiC framework for analyzing mathematical abstraction of a group of pre-service teachers consisted of four students in learning Parallel Coordinates concepts. The data were collected through video recording, students’ worksheet, test, and field notes. The result shows that the students’ prior knowledge related to concept of the Cartesian coordinate has significant role in the process of constructing Parallel Coordinates concept as a new knowledge. The consolidation process is influenced by the social interaction between group members. The abstraction process taken place in this group were dominated by empirical abstraction that emphasizes on the aspect of identifying characteristic of manipulated or imagined object during the process of recognizing and building-with.
Parallel processing using an optical delay-based reservoir computer

Science.gov (United States)

Van der Sande, Guy; Nguimdo, Romain Modeste; Verschaffelt, Guy

2016-04-01

Delay systems subject to delayed optical feedback have recently shown great potential in solving computationally hard tasks. By implementing a neuro-inspired computational scheme relying on the transient response to optical data injection, high processing speeds have been demonstrated. However, reservoir computing systems based on delay dynamics discussed in the literature are designed by coupling many different stand-alone components which lead to bulky, lack of long-term stability, non-monolithic systems. Here we numerically investigate the possibility of implementing reservoir computing schemes based on semiconductor ring lasers. Semiconductor ring lasers are semiconductor lasers where the laser cavity consists of a ring-shaped waveguide. SRLs are highly integrable and scalable, making them ideal candidates for key components in photonic integrated circuits. SRLs can generate light in two counterpropagating directions between which bistability has been demonstrated. We demonstrate that two independent machine learning tasks , even with different nature of inputs with different input data signals can be simultaneously computed using a single photonic nonlinear node relying on the parallelism offered by photonics. We illustrate the performance on simultaneous chaotic time series prediction and a classification of the Nonlinear Channel Equalization. We take advantage of different directional modes to process individual tasks. Each directional mode processes one individual task to mitigate possible crosstalk between the tasks. Our results indicate that prediction/classification with errors comparable to the state-of-the-art performance can be obtained even with noise despite the two tasks being computed simultaneously. We also find that a good performance is obtained for both tasks for a broad range of the parameters. The results are discussed in detail in [Nguimdo et al., IEEE Trans. Neural Netw. Learn. Syst. 26, pp. 3301-3307, 2015

The Development of Reading and Spelling in Arabic Orthography: Two Parallel Processes?

Science.gov (United States)

Taha, Haitham

2016-01-01

The parallels between reading and spelling skills in Arabic were tested. One-hundred forty-three native Arab students, with typical reading development, from second, fourth, and sixth grades were tested with reading, spelling and orthographic decision tasks. The results indicated a full parallel between the reading and spelling performances within…
Optimization approaches to mpi and area merging-based parallel buffer algorithm

Directory of Open Access Journals (Sweden)

Junfu Fan

Full Text Available On buffer zone construction, the rasterization-based dilation method inevitably introduces errors, and the double-sided parallel line method involves a series of complex operations. In this paper, we proposed a parallel buffer algorithm based on area merging and MPI (Message Passing Interface to improve the performances of buffer analyses on processing large datasets. Experimental results reveal that there are three major performance bottlenecks which significantly impact the serial and parallel buffer construction efficiencies, including the area merging strategy, the task load balance method and the MPI inter-process results merging strategy. Corresponding optimization approaches involving tree-like area merging strategy, the vertex number oriented parallel task partition method and the inter-process results merging strategy were suggested to overcome these bottlenecks. Experiments were carried out to examine the performance efficiency of the optimized parallel algorithm. The estimation results suggested that the optimization approaches could provide high performance and processing ability for buffer construction in a cluster parallel environment. Our method could provide insights into the parallelization of spatial analysis algorithm.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot

2010-01-01

-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster
Analysis of parameters for technological equipment of parallel kinematics based on rods of variable length for processing accuracy assurance

Science.gov (United States)

Koltsov, A. G.; Shamutdinov, A. H.; Blokhin, D. A.; Krivonos, E. V.

2018-01-01

A new classification of parallel kinematics mechanisms on symmetry coefficient, being proportional to mechanism stiffness and accuracy of the processing product using the technological equipment under study, is proposed. A new version of the Stewart platform with a high symmetry coefficient is presented for analysis. The workspace of the mechanism under study is described, this space being a complex solid figure. The workspace end points are reached by the center of the mobile platform which moves in parallel related to the base plate. Parameters affecting the processing accuracy, namely the static and dynamic stiffness, natural vibration frequencies are determined. The capability assessment of the mechanism operation under various loads, taking into account resonance phenomena at different points of the workspace, was conducted. The study proved that stiffness and therefore, processing accuracy with the use of the above mentioned mechanisms are comparable with the stiffness and accuracy of medium-sized series-produced machines.
Parallel algorithms on the ASTRA SIMD machine

International Nuclear Information System (INIS)

Odor, G.; Rohrbach, F.; Vesztergombi, G.; Varga, G.; Tatrai, F.

1996-01-01

In view of the tremendous computing power jump of modern RISC processors the interest in parallel computing seems to be thinning out. Why use a complicated system of parallel processors, if the problem can be solved by a single powerful micro-chip. It is a general law, however, that exponential growth will always end by some kind of a saturation, and then parallelism will again become a hot topic. We try to prepare ourselves for this eventuality. The MPPC project started in 1990 in the keydeys of parallelism and produced four ASTRA machines (presented at CHEP's 92) with 4k processors (which are expandable to 16k) based on yesterday's chip-technology (chip presented at CHEP'91). These machines now provide excellent test-beds for algorithmic developments in a complete, real environment. We are developing for example fast-pattern recognition algorithms which could be used in high-energy physics experiments at the LHC (planned to be operational after 2004 at CERN) for triggering and data reduction. The basic feature of our ASP (Associate String Processor) approach is to use extremely simple (thus very cheap) processor elements but in huge quantities (up to millions of processors) connected together by a very simple string-like communication chain. In this paper we present powerful algorithms based on this architecture indicating the performance perspectives if the hardware quality reaches present or even future technology levels. (author)
A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

International Nuclear Information System (INIS)

Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

1993-01-01

The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)
Synchronization Techniques in Parallel Discrete Event Simulation

OpenAIRE

Lindén, Jonatan

2018-01-01

Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
Programming massively parallel processors a hands-on approach

CERN Document Server

Kirk, David B

2010-01-01

Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Understanding Quality in Process Modelling: Towards a Holistic Perspective

Directory of Open Access Journals (Sweden)

Jan Recker

2007-09-01

Full Text Available Quality is one of the main topics in current conceptual modelling research, as is the field of business process modelling. Yet, widely acknowledged academic contributions towards an understanding or measurement of business process model quality are limited at best. In this paper I argue that the development of methodical theories concerning the measurement or establishment of process model quality must be preceded by methodological elaborations on business process modelling. I further argue that existing epistemological foundations of process modelling are insufficient for describing all extrinsic and intrinsic traits of model quality. This in turn has led to a lack of holistic understanding of process modelling. Taking into account the inherent social and purpose-oriented character of process modelling in contemporary organizations I present a socio-pragmatic constructionist methodology of business process modelling and sketch out implications of this perspective towards an understanding of process model quality. I anticipate that, based on this research, theories can be developed that facilitate the evaluation of the ’goodness’ of a business process model.
Otolaryngology residency selection process. Medical student perspective.

Science.gov (United States)

Stringer, S P; Cassisi, N J; Slattery, W H

1992-04-01

In an effort to improve the otolaryngology matching process at the University of Florida, Gainesville, we sought to obtain the medical student's perspective of the current system. All students who interviewed here over a 3-year period were surveyed regarding the application, interview, and ranking process. In addition, suggestions for improving the system were sought from the students. The application and interviewing patterns of the students surveyed were found to be similar to those of the entire otolaryngology residency applicant pool. We were unable to identify any factors that influence a student's rank list that could be prospectively used to help select applicants for interview. A variety of suggestions for improvements in the match were received, several of which could easily be instituted. A uniform interview invitation date as requested by the students could be rapidly implemented and would provide benefits for both the students and the residency programs.
Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

Science.gov (United States)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
Parallel hierarchical radiosity rendering

Energy Technology Data Exchange (ETDEWEB)

Carter, Michael [Iowa State Univ., Ames, IA (United States)

1993-07-01

In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Towards a streaming model for nested data parallelism

DEFF Research Database (Denmark)

Madsen, Frederik Meisner; Filinski, Andrzej

2013-01-01

The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...
Customizable Memory Schemes for Data Parallel Architectures

NARCIS (Netherlands)

Gou, C.

2011-01-01

Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses
Diagnostic Perspectives on the Family: Process, Structural and Historical Contextual Models.

Science.gov (United States)

Levant, Ronald F.

1983-01-01

Describes diagnostic perspectives for viewing dysfunctional families. Presents three general types of models (process, structural, and historical) and organized them along a continuum from most descriptive to most inferential. Presented at the 39th Annual Conference of the American Association for Marriage and Family Therapy, October-November…
The Holistic Processing Account of Visual Expertise in Medical Image Perception: A Review.

Science.gov (United States)

Sheridan, Heather; Reingold, Eyal M

2017-01-01

In the field of medical image perception, the holistic processing perspective contends that experts can rapidly extract global information about the image, which can be used to guide their subsequent search of the image (Swensson, 1980; Nodine and Kundel, 1987; Kundel et al., 2007). In this review, we discuss the empirical evidence supporting three different predictions that can be derived from the holistic processing perspective: Expertise in medical image perception is domain-specific, experts use parafoveal and/or peripheral vision to process large regions of the image in parallel, and experts benefit from a rapid initial glimpse of an image. In addition, we discuss a pivotal recent study (Litchfield and Donovan, 2016) that seems to contradict the assumption that experts benefit from a rapid initial glimpse of the image. To reconcile this finding with the existing literature, we suggest that global processing may serve multiple functions that extend beyond the initial glimpse of the image. Finally, we discuss future research directions, and we highlight the connections between the holistic processing account and similar theoretical perspectives and findings from other domains of visual expertise.
Stampi: a message passing library for distributed parallel computing. User's guide

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

1998-11-01

A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on MPI2 specification. It realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Vender implemented MPI as a closed system in one parallel machine and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, and any communication on them can be processed on. (author)
Parallelization of the model-based iterative reconstruction algorithm DIRA

International Nuclear Information System (INIS)

Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

2016-01-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
6th International Parallel Tools Workshop

CERN Document Server

Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

2013-01-01

The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
Eliminating graphs by means of parallel knock-out schemes

NARCIS (Netherlands)

Broersma, H.J.; Fomin, F.V.; Královic, R.; Woeginger, G.J.

2007-01-01

In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which

Eliminating graphs by means of parallel knock-out schemes

NARCIS (Netherlands)

Broersma, Haitze J.; Fomin, F.V.; Královič, R.; Woeginger, Gerhard

In 1997 Lampert and Slater introduced parallel knock-out schemes, an iterative process on graphs that goes through several rounds. In each round of this process, every vertex eliminates exactly one of its neighbors. The parallel knock-out number of a graph is the minimum number of rounds after which
Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Discrete Hadamard transformation algorithm's parallelism analysis and achievement

Science.gov (United States)

Hu, Hui

2009-07-01

With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

Directory of Open Access Journals (Sweden)

Mustafa Basthikodi

2016-04-01

Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

Science.gov (United States)

Liu, Kuojuey Ray

1990-01-01

Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Novel encoding and updating of positional, or directional, spatial cues are processed by distinct hippocampal subfields: Evidence for parallel information processing and the "what" stream.

Science.gov (United States)

Hoang, Thu-Huong; Aliane, Verena; Manahan-Vaughan, Denise

2018-05-01

The specific roles of hippocampal subfields in spatial information processing and encoding are, as yet, unclear. The parallel map theory postulates that whereas the CA1 processes discrete environmental features (positional cues used to generate a "sketch map"), the dentate gyrus (DG) processes large navigation-relevant landmarks (directional cues used to generate a "bearing map"). Additionally, the two-streams hypothesis suggests that hippocampal subfields engage in differentiated processing of information from the "where" and the "what" streams. We investigated these hypotheses by analyzing the effect of exploration of discrete "positional" features and large "directional" spatial landmarks on hippocampal neuronal activity in rats. As an indicator of neuronal activity we measured the mRNA induction of the immediate early genes (IEGs), Arc and Homer1a. We observed an increase of this IEG mRNA in CA1 neurons of the distal neuronal compartment and in proximal CA3, after novel spatial exploration of discrete positional cues, whereas novel exploration of directional cues led to increases in IEG mRNA in the lower blade of the DG and in proximal CA3. Strikingly, the CA1 did not respond to directional cues and the DG did not respond to positional cues. Our data provide evidence for both the parallel map theory and the two-streams hypothesis and suggest a precise compartmentalization of the encoding and processing of "what" and "where" information occurs within the hippocampal subfields. © 2018 The Authors. Hippocampus Published by Wiley Periodicals, Inc.
Adding Perspective: Predicting Adolescent Sunscreen Use with an Extended Health Action Process Approach.

Science.gov (United States)

Schüz, Natalie; Schüz, Benjamin; Eid, Michael

2016-07-01

Diseases such as skin cancer often have a very long latency period. For adolescents, especially, it may be difficult to grasp that current risk behavior is related to future health outcomes. This study examines the role of health-related time perspective (i.e. the degree to which short-term outcomes are discounted over long-time health benefits) within the Health Action Process Approach (HAPA). More specifically, based on expectancy*value theory, we tested whether time perspective interacts with self-efficacy, the central variable in this approach. A longitudinal study with three measurement points across one year assessed 156 high school students. Data were analyzed using structural equation models. While time perspective had no direct association with sunscreen use intentions, there was an interaction effect with self-efficacy; the shorter the time perspective, the smaller the association of self-efficacy with intention. Intention in turn predicted planning and sunscreen use at Time 3 (one year later). In order to maximise the impact of early onset measures for skin cancer prevention targeting the motivation for sunscreen use in adolescents, time perspective should be addressed in comprehensive sun protection interventions. © 2016 The International Association of Applied Psychology.
Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

Energy Technology Data Exchange (ETDEWEB)

Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

2017-06-15

Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.
Single product lot-sizing on unrelated parallel machines with non-decreasing processing times

Science.gov (United States)

Eremeev, A.; Kovalyov, M.; Kuznetsov, P.

2018-01-01

We consider a problem in which at least a given quantity of a single product has to be partitioned into lots, and lots have to be assigned to unrelated parallel machines for processing. In one version of the problem, the maximum machine completion time should be minimized, in another version of the problem, the sum of machine completion times is to be minimized. Machine-dependent lower and upper bounds on the lot size are given. The product is either assumed to be continuously divisible or discrete. The processing time of each machine is defined by an increasing function of the lot volume, given as an oracle. Setup times and costs are assumed to be negligibly small, and therefore, they are not considered. We derive optimal polynomial time algorithms for several special cases of the problem. An NP-hard case is shown to admit a fully polynomial time approximation scheme. An application of the problem in energy efficient processors scheduling is considered.
Parallel processing in the brain’s visual form system: An fMRI study

Directory of Open Access Journals (Sweden)

Yoshihito eShigihara

2014-07-01

Full Text Available We here extend and complement our earlier time-based, magneto-encephalographic (MEG, study of the processing of forms by the visual brain (Shigihara and Zeki, 2013 with a functional magnetic resonance imaging (fMRI study, in order to better localize the activity produced in early visual areas when subjects view simple geometric stimuli of increasing perceptual complexity (lines, angles, rhomboids constituted from the same elements (lines. Our results show that all three categories of form activate all three visual areas with which we were principally concerned (V1, V2, V3, with angles producing the strongest and rhomboids the weakest activity in all three. The difference between the activity produced by angles and rhomboids was significant, that between lines and rhomboids was trend significant while that between lines and angles was not. Taken together with our earlier MEG results, the present ones suggest that a parallel strategy is used in processing forms, in addition to the well-documented hierarchical strategy.
Involving patients in health technology funding decisions: stakeholder perspectives on processes used in Australia.

Science.gov (United States)

Lopes, Edilene; Street, Jackie; Carter, Drew; Merlin, Tracy

2016-04-01

Governments use a variety of processes to incorporate public perspectives into policymaking, but few studies have evaluated these processes from participants' point of view. The objective of this study was twofold: to understand the perspectives of selected stakeholders with regard to involvement processes used by Australian Advisory Committees to engage the public and patients; and to identify barriers and facilitators to participation. Twelve semi-structured interviews were conducted with representatives of different stakeholder groups involved in health technology funding decisions in Australia. Data were collected and analysed using a theoretical framework created by Rowe and Frewer, but adapted to more fully acknowledge issues of power and influence. Stakeholder groups disagreed as to what constitutes effective and inclusive patient involvement. Barriers reported by interviewees included poor communication, a lack of transparency, unworkable deadlines, and inadequate representativeness. Also described were problems associated with defining the task for patients and their advocates and with the timing of patient input in the decision-making process. Interviewees suggested that patient participation could be improved by increasing the number of patient organizations engaged in processes and including those organizations at different stages of decision making, especially earlier. The different evaluations made by stakeholder groups appear to be underpinned by contrasting conceptions of public involvement and its value, in line with Graham Martin's work which distinguishes between 'technocratic' and 'democratic' public involvement. Understanding stakeholders' perspectives and the contrasting conceptions of public involvement could foster future agreement on which processes should be used to involve the public in decision making. © 2015 John Wiley & Sons Ltd.
Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Ion LUNGU

2012-01-01

Full Text Available In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs that implement the Compute Unified Device Architecture (CUDA, a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104 and a central processing unit; the data type influence; the binary operator's influence.
Design of web questionnaires : An information-processing perspective for the effect of response categories

NARCIS (Netherlands)

Toepoel, V.; Vis, C.M.; Das, J.W.M.; van Soest, A.H.O.

2009-01-01

In this article, an information-processing perspective is used to explore the impact of response categories on the answers respondents provide in Web surveys. Response categories have a significant effect on response formulation in questions that are difficult to process, whereas in easier questions
PRISMA/DB: A Parallel Main-Memory Relational DBMS

NARCIS (Netherlands)

Apers, Peter M.G.; Flokstra, Jan; van den Berg, Carel A.; Grefen, P.W.P.J.; Wilschut, A.N.; Kersten, Martin L.; van den Berg, C.A.

1992-01-01

PRISMA/DB, a full-fledged parallel, main memory relational database management system (DBMS) is described. PRISMA/DB's high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. A flexible architecture for experimenting with
Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

Science.gov (United States)

Jones, W. H.

1983-01-01

The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.
Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

International Nuclear Information System (INIS)

Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

1993-11-01

In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)
Triboelectric effect: A new perspective on electron transfer process

Science.gov (United States)

Pan, Shuaihang; Zhang, Zhinan

2017-10-01

As interest in the triboelectric effect increases in line with the development of tribo-electrification related devices, the mechanisms involved in this phenomenon require more systematic review from the dual perspectives of developed classical insights and emerging quantum understanding. In this paper, the clear energy changing and transferring process of electrons have been proposed from the quantum point of view as the trigger for the charging initiation process in the triboelectric effect, and the phonon modes on the friction surfaces are believed to hold great importance as one of the main driving forces. Compatible with Maxwell Displacement Current theory, the complete consideration for charging steady state, i.e., the competition mechanisms between the breakdown process and the continuously charging process, and the balance mechanisms of phonon-electron interaction, built voltage, and induced polarization, are illustrated. In brief, the proposed theory emphasizes the fundamental role of electron transferring in tribo-electrical fields. By comparing certain experimental results from the previous studies, the theory is justified.
Parallel/vector algorithms for the spherical SN transport theory method

International Nuclear Information System (INIS)

Haghighat, A.; Mattis, R.E.

1990-01-01

This paper discusses vector and parallel processing of a 1-D curvilinear (i.e. spherical) S N transport theory algorithm on the Cornell National SuperComputer Facility (CNSF) IBM 3090/600E. Two different vector algorithms were developed and parallelized based on angular decomposition. It is shown that significant speedups are attainable. For example, for problems with large granularity, using 4 processors, the parallel/vector algorithm achieves speedups (for wall-clock time) of more than 4.5 relative to the old serial/scalar algorithm. Furthermore, this work has demonstrated the existing potential for the development of faster processing vector and parallel algorithms for multidimensional curvilinear geometries. (author)
Parallel, but Dissociable, Processing in Discrete Corticostriatal Inputs Encodes Skill Learning.

Science.gov (United States)

Kupferschmidt, David A; Juczewski, Konrad; Cui, Guohong; Johnson, Kari A; Lovinger, David M

2017-10-11

Changes in cortical and striatal function underlie the transition from novel actions to refined motor skills. How discrete, anatomically defined corticostriatal projections function in vivo to encode skill learning remains unclear. Using novel fiber photometry approaches to assess real-time activity of associative inputs from medial prefrontal cortex to dorsomedial striatum and sensorimotor inputs from motor cortex to dorsolateral striatum, we show that associative and sensorimotor inputs co-engage early in action learning and disengage in a dissociable manner as actions are refined. Disengagement of associative, but not sensorimotor, inputs predicts individual differences in subsequent skill learning. Divergent somatic and presynaptic engagement in both projections during early action learning suggests potential learning-related in vivo modulation of presynaptic corticostriatal function. These findings reveal parallel processing within associative and sensorimotor circuits that challenges and refines existing views of corticostriatal function and expose neuronal projection- and compartment-specific activity dynamics that encode and predict action learning. Published by Elsevier Inc.

Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

Science.gov (United States)

Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

2016-03-08

The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Cellular automata a parallel model

CERN Document Server

Mazoyer, J

1999-01-01

Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.
Multibus-based parallel processor for simulation

Science.gov (United States)

Ogrady, E. P.; Wang, C.-H.

1983-01-01

A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
High-throughput fabrication of micrometer-sized compound parabolic mirror arrays by using parallel laser direct-write processing

International Nuclear Information System (INIS)

Yan, Wensheng; Gu, Min; Cumming, Benjamin P

2015-01-01

Micrometer-sized parabolic mirror arrays have significant applications in both light emitting diodes and solar cells. However, low fabrication throughput has been identified as major obstacle for the mirror arrays towards large-scale applications due to the serial nature of the conventional method. Here, the mirror arrays are fabricated by using a parallel laser direct-write processing, which addresses this barrier. In addition, it is demonstrated that the parallel writing is able to fabricate complex arrays besides simple arrays and thus offers wider applications. Optical measurements show that each single mirror confines the full-width at half-maximum value to as small as 17.8 μm at the height of 150 μm whilst providing a transmittance of up to 68.3% at a wavelength of 633 nm in good agreement with the calculation values. (paper)
Parallel computation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

1997-01-01

The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Effects of parallel planning on agreement production.

Science.gov (United States)

Veenstra, Alma; Meyer, Antje S; Acheson, Daniel J

2015-11-01

An important issue in current psycholinguistics is how the time course of utterance planning affects the generation of grammatical structures. The current study investigated the influence of parallel activation of the components of complex noun phrases on the generation of subject-verb agreement. Specifically, the lexical interference account (Gillespie & Pearlmutter, 2011b; Solomon & Pearlmutter, 2004) predicts more agreement errors (i.e., attraction) for subject phrases in which the head and local noun mismatch in number (e.g., the apple next to the pears) when nouns are planned in parallel than when they are planned in sequence. We used a speeded picture description task that yielded sentences such as the apple next to the pears is red. The objects mentioned in the noun phrase were either semantically related or unrelated. To induce agreement errors, pictures sometimes mismatched in number. In order to manipulate the likelihood of parallel processing of the objects and to test the hypothesized relationship between parallel processing and the rate of agreement errors, the pictures were either placed close together or far apart. Analyses of the participants' eye movements and speech onset latencies indicated slower processing of the first object and stronger interference from the related (compared to the unrelated) second object in the close than in the far condition. Analyses of the agreement errors yielded an attraction effect, with more errors in mismatching than in matching conditions. However, the magnitude of the attraction effect did not differ across the close and far conditions. Thus, spatial proximity encouraged parallel processing of the pictures, which led to interference of the associated conceptual and/or lexical representation, but, contrary to the prediction, it did not lead to more attraction errors. Copyright © 2015 Elsevier B.V. All rights reserved.
The Computational Processing of Intonational Prominence: A Functional Prosody Perspective

OpenAIRE

Nakatani, Christine Hisayo

1997-01-01

Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two important assumptions: first, there is an aspect of prominence interpretation that centrally concerns discourse processes, namely the discourse focusing nature of prominence; and second, the role of p...
The position dependent influence that sensitivity correction processing gives the signal-to-noise ratio measurement in parallel imaging

International Nuclear Information System (INIS)

Murakami, Koichi; Yoshida, Koji; Yanagimoto, Shinichi

2012-01-01

We studied the position dependent influence that sensitivity correction processing gave the signal-to-noise ratio (SNR) measurement of parallel imaging (PI). Sensitivity correction processing that referred to the sensitivity distribution of the body coil improved regional uniformity more than the sensitivity uniformity correction filter with a fixed correction factor. In addition, the position dependent influence to give the SNR measurement in PI was different from the sensitivity correction processing. Therefore, if we divide SNR of the sensitivity correction processing image by SNR of the original image in each pixel and calculate SNR ratio, we can show the position dependent influence that sensitivity correction processing gives the SNR measurement in PI. It is with an index of the sensitivity correction processing precision. (author)
Massively parallel multicanonical simulations

Science.gov (United States)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
BPMN as a Communication Language for the Process- and Event-Oriented Perspectives in Fact-Oriented Conceptual Models

Science.gov (United States)

Bollen, Peter

In this paper we will show how the OMG specification of BPMN (Business Process Modeling Notation) can be used to model the process- and event-oriented perspectives of an application subject area. We will illustrate how the fact-oriented conceptual models for the information-, process- and event perspectives can be used in a 'bottom-up' approach for creating a BPMN model in combination with other approaches, e.g. the use of a textual description. We will use the common doctor's office example as a running example in this article.
Enabling Chemistry Technologies and Parallel Synthesis-Accelerators of Drug Discovery Programmes.

Science.gov (United States)

Vasudevan, A; Bogdan, A R; Koolman, H F; Wang, Y; Djuric, S W

There is a pressing need to improve overall productivity in the pharmaceutical industry. Judicious investments in chemistry technologies can have a significant impact on cycle times, cost of goods and probability of technical success. This perspective describes some of these technologies developed and implemented at AbbVie, and their applications to the synthesis of novel scaffolds and to parallel synthesis. © 2017 Elsevier B.V. All rights reserved.
Cooperative storage of shared files in a parallel computing system with dynamic block size

Science.gov (United States)

Bent, John M.; Faibish, Sorin; Grider, Gary

2015-11-10

Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).
DIMACS Workshop on Interconnection Networks and Mapping, and Scheduling Parallel Computations

CERN Document Server

Rosenberg, Arnold L; Sotteau, Dominique; NSF Science and Technology Center in Discrete Mathematics and Theoretical Computer Science; Interconnection networks and mapping and scheduling parallel computations

1995-01-01

The interconnection network is one of the most basic components of a massively parallel computer system. Such systems consist of hundreds or thousands of processors interconnected to work cooperatively on computations. One of the central problems in parallel computing is the task of mapping a collection of processes onto the processors and routing network of a parallel machine. Once this mapping is done, it is critical to schedule computations within and communication among processor from universities and laboratories, as well as practitioners involved in the design, implementation, and application of massively parallel systems. Focusing on interconnection networks of parallel architectures of today and of the near future , the book includes topics such as network topologies,network properties, message routing, network embeddings, network emulation, mappings, and efficient scheduling. inputs for a process are available where and when the process is scheduled to be computed. This book contains the refereed pro...
Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

KAUST Repository

Gunnels, John; Lee, Jon; Margulies, Susan

2010-01-01

We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

KAUST Repository

Gunnels, John

2010-06-01

We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Peeling the Onion of Auditory Processing Disorder: A Language/Curricular-Based Perspective

Science.gov (United States)

Wallach, Geraldine P.

2011-01-01

Purpose: This article addresses auditory processing disorder (APD) from a language-based perspective. The author asks speech-language pathologists to evaluate the functionality (or not) of APD as a diagnostic category for children and adolescents with language-learning and academic difficulties. Suggestions are offered from a…
Fast parallel event reconstruction

CERN Multimedia

CERN. Geneva

2010-01-01

On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer. Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...
Parallel computation for distributed parameter system-from vector processors to Adena computer

Energy Technology Data Exchange (ETDEWEB)

Nogi, T

1983-04-01

Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
Parallelization methods study of thermal-hydraulics codes

International Nuclear Information System (INIS)

Gaudart, Catherine

2000-01-01

The variety of parallelization methods and machines leads to a wide selection for programmers. In this study we suggest, in an industrial context, some solutions from the experience acquired through different parallelization methods. The study is about several scientific codes which simulate a large variety of thermal-hydraulics phenomena. A bibliography on parallelization methods and a first analysis of the codes showed the difficulty of our process on the whole applications to study. Therefore, it would be necessary to identify and extract a representative part of these applications and parallelization methods. The linear solver part of the codes forced itself. On this particular part several parallelization methods had been used. From these developments one could estimate the necessary work for a non initiate programmer to parallelize his application, and the impact of the development constraints. The different methods of parallelization tested are the numerical library PETSc, the parallelizer PAF, the language HPF, the formalism PEI and the communications library MPI and PYM. In order to test several methods on different applications and to follow the constraint of minimization of the modifications in codes, a tool called SPS (Server of Parallel Solvers) had be developed. We propose to describe the different constraints about the optimization of codes in an industrial context, to present the solutions given by the tool SPS, to show the development of the linear solver part with the tested parallelization methods and lastly to compare the results against the imposed criteria. (author) [fr
Parallel optoelectronic trinary signed-digit division

Science.gov (United States)

Alam, Mohammad S.

1999-03-01

The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

Language constructs for modular parallel programs

Energy Technology Data Exchange (ETDEWEB)

Foster, I.

1996-03-01

We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

Science.gov (United States)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Multi-petascale highly efficient parallel supercomputer

Science.gov (United States)

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

2018-05-15

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
Massively parallel evolutionary computation on GPGPUs

CERN Document Server

Tsutsui, Shigeyoshi

2013-01-01

Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
The kpx, a program analyzer for parallelization

International Nuclear Information System (INIS)

Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

1997-03-01

The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Options for Parallelizing a Planning and Scheduling Algorithm

Science.gov (United States)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

2011-01-01

Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Parallel Ada benchmarks for the SVMS

Science.gov (United States)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
Parallel computers and three-dimensional computational electromagnetics

International Nuclear Information System (INIS)

Madsen, N.K.

1994-01-01

The authors have continued to enhance their ability to use new massively parallel processing computers to solve time-domain electromagnetic problems. New vectorization techniques have improved the performance of their code DSI3D by factors of 5 to 15, depending on the computer used. New radiation boundary conditions and far-field transformations now allow the computation of radar cross-section values for complex objects. A new parallel-data extraction code has been developed that allows the extraction of data subsets from large problems, which have been run on parallel computers, for subsequent post-processing on workstations with enhanced graphics capabilities. A new charged-particle-pushing version of DSI3D is under development. Finally, DSI3D has become a focal point for several new Cooperative Research and Development Agreement activities with industrial companies such as Lockheed Advanced Development Company, Varian, Hughes Electron Dynamics Division, General Atomic, and Cray
ISP: an optimal out-of-core image-set processing streaming architecture for parallel heterogeneous systems.

Science.gov (United States)

Ha, Linh Khanh; Krüger, Jens; Dihl Comba, João Luiz; Silva, Cláudio T; Joshi, Sarang

2012-06-01

Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.
Implementation science: a role for parallel dual processing models of reasoning?

Science.gov (United States)

Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

2006-05-25

A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important considerations in any
JIT and supply chain management: an information processing perspective

Directory of Open Access Journals (Sweden)

Guilherme Guedes Xavier

1998-06-01

Full Text Available Recently, as part of the restructuration of the global economy, new technologies and new consumer trends have led to changes in markets around the world. Based on this restructuration, some markets have faced extremely high degrees of competition, which led to the emergence of new concepts in the productive sector. One of the most important concepts was the spread of JIT systems outside Japan in the early 80s. Currently, another prominent concept, named supply-chain management is emerging In this paper, I take an information processing perspective to analyze both and conclude that, although complementary in dealing with uncertainties, both use different information processing strategies. Understanding the concepts and the relationship between them is important to their further development and diffusion among practioners and scholars' communities.
Parallel fuzzy connected image segmentation on GPU

OpenAIRE

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...
High performance parallel computers for science

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

1989-01-01

This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
Parallel object-oriented specification language

NARCIS (Netherlands)

Florescu, O.; Voeten, J.P.M.; Theelen, B.D.; Geilen, M.C.W.; Corporaal, H.; Burns, Alan

2008-01-01

The Parallel Object-Oriented Specification Language (POOSL) is an expressive modelling language for hardware/software systems [10]. It was originally defined in [7] as an object-oriented extension of process algebra CCS [6], supporting (conditional) synchronous message passing between
Bayer image parallel decoding based on GPU

Science.gov (United States)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Design of high-performance parallelized gene predictors in MATLAB.

Science.gov (United States)

Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien

2012-04-10

This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
A solution for automatic parallelization of sequential assembly code

Directory of Open Access Journals (Sweden)

Kovačević Đorđe

2013-01-01

Full Text Available Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.
Parallel rendering

Science.gov (United States)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Parallelizing an electron transport Monte Carlo simulator (MOCASIN 2.0)

International Nuclear Information System (INIS)

Schwetman, H.; Burdick, S.

1988-01-01

Electron transport simulators are tools for studying electrical properties of semiconducting materials and devices. As demands for modeling more complex devices and new materials have emerged, so have demands for more processing power. This paper documents a project to convert an electron transport simulator (MOCASIN 2.0) to a parallel processing environment. In addition to describing the conversion, the paper presents PPL, a parallel programming version of C running on a Sequent multiprocessor system. In timing tests, models that simulated the movement of 2,000 particles for 100 time steps were executed on ten processors, with a parallel efficiency of over 97%

Abstract Level Parallelization of Finite Difference Methods

Directory of Open Access Journals (Sweden)

Edwin Vollebregt

1997-01-01

Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Block-Parallel Data Analysis with DIY2

Energy Technology Data Exchange (ETDEWEB)

Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-08-30

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.
Parallel computations

CERN Document Server

1982-01-01

Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
Vectorization, parallelization and porting of nuclear codes on the VPP500 system (parallelization). Progress report fiscal 1996

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki [Fujitsu Ltd., Tokyo (Japan); and others

1997-12-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)
Parallel Processing of Big Point Clouds Using Z-Order Partitioning

Science.gov (United States)

Alis, C.; Boehm, J.; Liu, K.

2016-06-01

As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112) is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest neighbour algorithm
Parallel sorting algorithms

CERN Document Server

Akl, Selim G

1985-01-01

Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Parallel R-matrix computation

International Nuclear Information System (INIS)

Heggarty, J.W.

1999-06-01

For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
A Massively Parallel Face Recognition System

Directory of Open Access Journals (Sweden)

Lahdenoja Olli

2007-01-01

Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
A Massively Parallel Face Recognition System

Directory of Open Access Journals (Sweden)

Ari Paasio

2006-12-01

Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

Science.gov (United States)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Performing a local reduction operation on a parallel computer

Science.gov (United States)

Blocksome, Michael A.; Faraj, Daniel A.

2012-12-11

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Demand-side management process evaluations - the management perspective

International Nuclear Information System (INIS)

Perrault, G.A.; Barrett, L.B.

1993-01-01

A demand-side management (DSM) process evaluation is a qualitative, expert assessment of how a utility marketing program is being conducted. It reviews the efficiency and effectiveness in which a utility plans, manages, executes, and monitors the delivery of DSM programs to its marketplace. Process evaluations,which includes load impact, customer satisfaction and cost-effectiveness analysis, are becoming an increasingly significant component. The process evaluation focus is on the program planning and delivery process as opposed to the energy impacts resulting from the specific measures or products of the program. Because of this process-oriented focus, such evaluations can identify important opportunities for improving the cost-effectiveness of a program without significantly changing product lines. The evaluation may identify administrative or delivery process improvements. In addition, the evaluation may identify ways of improving the degree to which the customer is satisfied with the program or the utility. Since process evaluations are usually conducted as part of a utility's mandated DSM measurement and evaluation plan, they tend to focus mainly on the stated needs of the regulator as opposed to company management. This can be a problem. Although the regulatory perspective is important, in an increasingly competitive business environment, utilities must not overlook management's business and operational needs for specific information regarding DSM program planning, control, execution, and evaluation. This paper discusses some of the conflicts that exist between the regulator's and management's needs for DSM program evaluation results and presents some approaches for assuring that both needs are met. It is organized to first discuss the scope of a process evaluation, then the evaluation issues, the management concerns, and finally reporting of results
Decreasing Data Analytics Time: Hybrid Architecture MapReduce-Massive Parallel Processing for a Smart Grid

Directory of Open Access Journals (Sweden)

Abdeslam Mehenni

2017-03-01

Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.
Acoustic simulation in architecture with parallel algorithm

Science.gov (United States)

Li, Xiaohong; Zhang, Xinrong; Li, Dan

2004-03-01

In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
Hybrid parallel computing architecture for multiview phase shifting

Science.gov (United States)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Parallel MR imaging.

Science.gov (United States)

Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

2012-07-01

Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Comparison of multihardware parallel implementations for a phase unwrapping algorithm

Science.gov (United States)

Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

2018-04-01

Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
Frontiers of massively parallel scientific computation

International Nuclear Information System (INIS)

Fischer, J.R.

1987-07-01

Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented
High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

Science.gov (United States)

von Davier, Matthias

2016-01-01

This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
Parallel algorithms for interactive manipulation of digital terrain models

Science.gov (United States)

Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

1988-01-01

Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.

Distributed parallel computing in stochastic modeling of groundwater systems.

Science.gov (United States)

Dong, Yanhui; Li, Guomin; Xu, Haizhen

2013-03-01

Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.
Parallel processor programs in the Federal Government

Science.gov (United States)

Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

1985-01-01

In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
IMPLEMENTATION OF SERIAL AND PARALLEL BUBBLE SORT ON FPGA

Directory of Open Access Journals (Sweden)

Dwi Marhaendro Jati Purnomo

2016-06-01

Full Text Available Sorting is common process in computational world. Its utilization are on many fields from research to industry. There are many sorting algorithm in nowadays. One of the simplest yet powerful is bubble sort. In this study, bubble sort is implemented on FPGA. The implementation was taken on serial and parallel approach. Serial and parallel bubble sort then compared by means of its memory, execution time, and utility which comprises slices and LUTs. The experiments show that serial bubble sort required smaller memory as well as utility compared to parallel bubble sort. Meanwhile, parallel bubble sort performed faster than serial bubble sort
Combined spatial/angular domain decomposition SN algorithms for shared memory parallel machines

International Nuclear Information System (INIS)

Hunter, M.A.; Haghighat, A.

1993-01-01

Several parallel processing algorithms on the basis of spatial and angular domain decomposition methods are developed and incorporated into a two-dimensional discrete ordinates transport theory code. These algorithms divide the spatial and angular domains into independent subdomains so that the flux calculations within each subdomain can be processed simultaneously. Two spatial parallel algorithms (Block-Jacobi, red-black), one angular parallel algorithm (η-level), and their combinations are implemented on an eight processor CRAY Y-MP. Parallel performances of the algorithms are measured using a series of fixed source RZ geometry problems. Some of the results are also compared with those executed on an IBM 3090/600J machine. (orig.)
South Korea's aid to North Korea's transformation process: Social market perspective

OpenAIRE

Jang, Tae-seok

2007-01-01

South Korea's aid to North Korea is deviated from the international trend in development aid. As a stylized fact, we find that South Korea's policy keeping economic relationship with North Korea was inconsistent and ineffective during the last decade. Since South Korea played a major role in promoting economic transformation process in North Korea, perspectives from social market economy, open economy, stabilization, and investment in infrastructure provide insights in dealing with developmen...
Implementation science: a role for parallel dual processing models of reasoning?

Directory of Open Access Journals (Sweden)

Phillips Paddy A

2006-05-01

Full Text Available Abstract Background A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Discussion Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence and cognitive processing (e.g., thinking styles influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of
Curious parallels and curious connections--phylogenetic thinking in biology and historical linguistics.

Science.gov (United States)

Atkinson, Quentin D; Gray, Russell D

2005-08-01

In The Descent of Man (1871), Darwin observed "curious parallels" between the processes of biological and linguistic evolution. These parallels mean that evolutionary biologists and historical linguists seek answers to similar questions and face similar problems. As a result, the theory and methodology of the two disciplines have evolved in remarkably similar ways. In addition to Darwin's curious parallels of process, there are a number of equally curious parallels and connections between the development of methods in biology and historical linguistics. Here we briefly review the parallels between biological and linguistic evolution and contrast the historical development of phylogenetic methods in the two disciplines. We then look at a number of recent studies that have applied phylogenetic methods to language data and outline some current problems shared by the two fields.
MEDUSA - An overset grid flow solver for network-based parallel computer systems

Science.gov (United States)

Smith, Merritt H.; Pallis, Jani M.

1993-01-01

Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.
User-friendly parallelization of GAUDI applications with Python

International Nuclear Information System (INIS)

Mato, Pere; Smith, Eoin

2010-01-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
User-friendly parallelization of GAUDI applications with Python

Energy Technology Data Exchange (ETDEWEB)

Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

2010-04-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
Parallelization Experience with Four Canonical Econometric Models Using ParMitISEM

Directory of Open Access Journals (Sweden)

Nalan Baştürk

2016-03-01

Full Text Available This paper presents the parallel computing implementation of the MitISEM algorithm, labeled Parallel MitISEM. The basic MitISEM algorithm provides an automatic and flexible method to approximate a non-elliptical target density using adaptive mixtures of Student-t densities, where only a kernel of the target density is required. The approximation can be used as a candidate density in Importance Sampling or Metropolis Hastings methods for Bayesian inference on model parameters and probabilities. We present and discuss four canonical econometric models using a Graphics Processing Unit and a multi-core Central Processing Unit version of the MitISEM algorithm. The results show that the parallelization of the MitISEM algorithm on Graphics Processing Units and multi-core Central Processing Units is straightforward and fast to program using MATLAB. Moreover the speed performance of the Graphics Processing Unit version is much higher than the Central Processing Unit one.
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Duality-based algorithms for scheduling on unrelated parallel machines

NARCIS (Netherlands)

van de Velde, S.L.; van de Velde, S.L.

1993-01-01

We consider the following parallel machine scheduling problem. Each of n independent jobs has to be scheduled on one of m unrelated parallel machines. The processing of job J[sub l] on machine Mi requires an uninterrupted period of positive length p[sub lj]. The objective is to find an assignment of
Parallelization of ITOUGH2 using PVM

International Nuclear Information System (INIS)

Finsterle, Stefan

1998-01-01

ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
Industry and government perspectives on First Nations' participation in the British Columbia environmental assessment process

International Nuclear Information System (INIS)

Booth, Annie L.; Skelton, Norm W.

2011-01-01

Research was conducted with West Moberly First Nations, Halfway First Nation and the Treaty 8 Tribal Association (located in northeastern British Columbia, Canada) on effective engagement in environmental assessment processes. As part of this research, we examined the perspectives of a subset of resource industry proponents and their consultants, as well as staff from the British Columbia Environmental Assessment Office on their experiences with the requirement to consult with Canada's indigenous peoples. Research into the perspectives of industry proponents and consultants is almost non-existent, yet industry and governments are key participants within environmental assessments. This research found that industry proponents were disenfranchised by the British Columbia environmental assessment process and its mechanisms for consulting with First Nations, and that they sought changes to that process. Their concerns and their implications are documented and some recommendations are offered for addressing those concerns. Understanding industry and government views on First Nations engagement could suggest not only potential improvements in EA processes that facilitate all parties but provide common grounds for mutually engaging to resolve challenges.
Circuit and bond polytopes on series–parallel graphs

OpenAIRE

Borne , Sylvie; Fouilhoux , Pierre; Grappe , Roland; Lacroix , Mathieu; Pesneau , Pierre

2015-01-01

International audience; In this paper, we describe the circuit polytope on series–parallel graphs. We first show the existence of a compact extended formulation. Though not being explicit, its construction process helps us to inductively provide the description in the original space. As a consequence, using the link between bonds and circuits in planar graphs, we also describe the bond polytope on series–parallel graphs.
Research on Gear Shifting Process without Disengaging Clutch for a Parallel Hybrid Electric Vehicle Equipped with AMT

Directory of Open Access Journals (Sweden)

Hui-Long Yu

2014-01-01

Full Text Available Dynamic models of a single-shaft parallel hybrid electric vehicle (HEV equipped with automated mechanical transmission (AMT were described in different working stages during a gear shifting process without disengaging clutch. Parameters affecting the gear shifting time, components life, and gear shifting jerk in different transient states during a gear shifting process were deeply analyzed. The mathematical models considering the detailed synchronizer working process which can explain the gear shifting failure, long time gear shifting, and frequent synchronizer failure phenomenon in HEV were derived. Dynamic coordinated control strategy of the engine, motor, and actuators in different transient states considering the detailed working stages of synchronizer in a gear shifting process of a HEV is for the first time innovatively proposed according to the state of art references. Bench test and real road test results show that the proposed control strategy can improve the gear shifting quality in all its evaluation indexes significantly.
A SPECT reconstruction method for extending parallel to non-parallel geometries

International Nuclear Information System (INIS)

Wen Junhai; Liang Zhengrong

2010-01-01

Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
A review of advanced small-scale parallel bioreactor technology for accelerated process development: current state and future need.

Science.gov (United States)

Bareither, Rachel; Pollard, David

2011-01-01

The pharmaceutical and biotech industries face continued pressure to reduce development costs and accelerate process development. This challenge occurs alongside the need for increased upstream experimentation to support quality by design initiatives and the pursuit of predictive models from systems biology. A small scale system enabling multiple reactions in parallel (n ≥ 20), with automated sampling and integrated to purification, would provide significant improvement (four to fivefold) to development timelines. State of the art attempts to pursue high throughput process development include shake flasks, microfluidic reactors, microtiter plates and small-scale stirred reactors. The limitations of these systems are compared to desired criteria to mimic large scale commercial processes. The comparison shows that significant technological improvement is still required to provide automated solutions that can speed upstream process development. Copyright © 2010 American Institute of Chemical Engineers (AIChE).
The language parallel Pascal and other aspects of the massively parallel processor

Science.gov (United States)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

LMFAO! Humor as a Response to Fear: Decomposing Fear Control within the Extended Parallel Process Model

Science.gov (United States)

Abril, Eulàlia P.; Szczypka, Glen; Emery, Sherry L.

2017-01-01

This study seeks to analyze fear control responses to the 2012 Tips from Former Smokers campaign using the Extended Parallel Process Model (EPPM). The goal is to examine the occurrence of ancillary fear control responses, like humor. In order to explore individuals’ responses in an organic setting, we use Twitter data—tweets—collected via the Firehose. Content analysis of relevant fear control tweets (N = 14,281) validated the existence of boomerang responses within the EPPM: denial, defensive avoidance, and reactance. More importantly, results showed that humor tweets were not only a significant occurrence but constituted the majority of fear control responses. PMID:29527092
Parallel Atomistic Simulations

Energy Technology Data Exchange (ETDEWEB)

HEFFELFINGER,GRANT S.

2000-01-18

Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Directory of Open Access Journals (Sweden)

Ramu Seva

2017-11-01

Full Text Available The high performance of FPGA (Field Programmable Gate Array in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs and parallel stochastic bitstream processing, which are well suited to leverage FPGA’s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively.
Exploring Time Perspective in Greek Young Adults: Validation of the Zimbardo Time Perspective Inventory and Relationships with Mental Health Indicators

Science.gov (United States)

Anagnostopoulos, Fotios; Griva, Fay

2012-01-01

In this article we examine the factorial structure of the Greek version of the Zimbardo Time Perspective Inventory (ZTPI; Zimbardo and Boyd in "J Personal Soc Psychol" 77:1271-1288, 1999), in a sample of 337 university students, using principal axis factoring (PAF) with oblique rotation, and its dimensionality using parallel analysis.…
Parallelism Effects and Verb Activation: The Sustained Reactivation Hypothesis

Science.gov (United States)

Callahan, Sarah M.; Shapiro, Lewis P.; Love, Tracy

2010-01-01

This study investigated the processes underlying parallelism by evaluating the activation of a parallel element (i.e., a verb) throughout "and"-coordinated sentences. Four points were tested: (1) approximately 1,600ms after the verb in the first conjunct (PP1), (2) immediately following the conjunction (PP2), (3) approximately 1,100ms after the…
Applicant Perspectives on the Otolaryngology Residency Application Process.

Science.gov (United States)

Ward, Matthew; Pingree, Christian; Laury, Adrienne M; Bowe, Sarah N

2017-08-01

It has been nearly 25 years since medical students were queried regarding their perspectives on otolaryngology-head and neck surgery (OHNS) residency selection. Understanding this viewpoint is critical to improving the current application process. To evaluate the perceptions of 2016 OHNS residency applicants regarding the application process and offer suggestions for reform. In this cross-sectional study of anonymous online survey data, a 14-question survey was designed based on resources obtained from a computerized PubMed, Ovid, and GoogleScholar database search of the English language from January 1, 1990, through December 31, 2015, was conducted using the following search terms: (medical student OR applicant) AND (application OR match) AND otolaryngology. The survey was administered to 2016 OHNS residency applicants to examine 4 primary areas: current attitudes toward the match, effect of the new Otolaryngology Program Directors Organization personal statement mandate, sources of advice and information, and suggestions for improvement. In January 2016, an email was sent to 100 program directors asking them to distribute the survey to current OHNS applicants at their institution. One follow-up reminder email was sent in February 2016. A link to the survey was posted on the Otomatch.com homepage on January 28, 2016, with the last response received on March 28, 2016. Survey responses regarding the residency application process. A total of 150 of 370 residency applicants (40.5%) responded to the survey. Of these, 125 respondents (90.6%) noted applying to programs in which they had no specific interest simply to improve their chances of matching. Applicants intended to apply to more programs than they actually did (63.6 vs 60.8; r = 0.19; 95% CI, -0.03 to 0.40). Program directors advised fewer applications than other sources; however, 58 respondents (38.7%) did not receive advice from a program director. A total of 121 respondents (80.7%) found online program
Comparison of microbial community shifts in two parallel multi-step drinking water treatment processes.

Science.gov (United States)

Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong

2017-07-01

Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.
Perspectives of using spin waves for computing and signal processing

Energy Technology Data Exchange (ETDEWEB)

Csaba, György, E-mail: gcsaba@gmail.com [Center for Nano Science and Technology, University of Notre Dame (United States); Faculty for Information Technology and Bionics, Pázmány Péter Catholic University (Hungary); Papp, Ádám [Center for Nano Science and Technology, University of Notre Dame (United States); Faculty for Information Technology and Bionics, Pázmány Péter Catholic University (Hungary); Porod, Wolfgang [Center for Nano Science and Technology, University of Notre Dame (United States)

2017-05-03

Highlights: • We give an overview of spin wave-based computing with emphasis on non-Boolean signal processors. • Spin waves can combine the best of electronics and photonics and do it in an on-chip and integrable way. • Copying successful approaches from microelectronics may not be the best way toward spin-wave based computing. • Practical devices can be constructed by minimizing the number of required magneto-electric interconnections. - Abstract: Almost all the world's information is processed and transmitted by either electric currents or photons. Now they may get a serious contender: spin-wave-based devices may just perform some information-processing tasks in a lot more efficient and practical way. In this article, we give an engineering perspective of the potential of spin-wave-based devices. After reviewing various flavors for spin-wave-based processing devices, we argue that the niche for spin-wave-based devices is low-power, compact and high-speed signal-processing devices, where most traditional electronics show poor performance.
An Automatic Instruction-Level Parallelization of Machine Code

Directory of Open Access Journals (Sweden)

MARINKOVIC, V.

2018-02-01

Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.
Comparison of some parallelization strategies of thermalhydraulic codes on GPUs

International Nuclear Information System (INIS)

Jendoubi, T.; Bergeaud, V.; Geay, A.

2013-01-01

Modern supercomputers architecture is now often based on hybrid concepts combining parallelism to distributed memory, parallelism to shared memory and also to GPUs (Graphic Process Units). In this work, we propose a new approach to take advantage of these graphic cards in thermohydraulics algorithms. (authors)
The 2nd Symposium on the Frontiers of Massively Parallel Computations

Science.gov (United States)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Minimizing makespan in a two-stage flow shop with parallel batch-processing machines and re-entrant jobs

Science.gov (United States)

Huang, J. D.; Liu, J. J.; Chen, Q. X.; Mao, N.

2017-06-01

Against a background of heat-treatment operations in mould manufacturing, a two-stage flow-shop scheduling problem is described for minimizing makespan with parallel batch-processing machines and re-entrant jobs. The weights and release dates of jobs are non-identical, but job processing times are equal. A mixed-integer linear programming model is developed and tested with small-scale scenarios. Given that the problem is NP hard, three heuristic construction methods with polynomial complexity are proposed. The worst case of the new constructive heuristic is analysed in detail. A method for computing lower bounds is proposed to test heuristic performance. Heuristic efficiency is tested with sets of scenarios. Compared with the two improved heuristics, the performance of the new constructive heuristic is superior.
Parallel processor for fast event analysis

International Nuclear Information System (INIS)

Hensley, D.C.

1983-01-01

Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system
HPC parallel programming model for gyrokinetic MHD simulation

International Nuclear Information System (INIS)

Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

2011-01-01

The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
Parallel integer sorting with medium and fine-scale parallelism

Science.gov (United States)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Directory of Open Access Journals (Sweden)

Wanrong Huang

2017-01-01

Full Text Available The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS, is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.
An extension of the extended parallel process model (EPPM) in television health news: the influence of health consciousness on individual message processing and acceptance.

Science.gov (United States)

Hong, Hyehyun

2011-06-01

The purpose of this study is to examine the role of health consciousness in processing TV news that contains potential health threats and preventive recommendations. Based on the extended parallel process model (Witte, 1992), relationships among health consciousness, perceived severity, perceived susceptibility, perceived response efficacy, perceived self-efficacy, and message acceptance/rejection were hypothesized. Responses collected from 175 participants after viewing four TV health news stories were analyzed using the bootstrapping analysis (Preacher & Hayes, 2008). Results confirmed three mediators (i.e., perceived severity, response efficacy, self-efficacy) in the influence of health consciousness on message acceptance. A negative association found between health consciousness and perceived susceptibility is discussed in relation to characteristics of health conscious individuals and optimistic bias of health risks.
Prospective Elementary School Teachers’ Views about Socioscientific Issues: A Concurrent Parallel Design Study

OpenAIRE

Muhammet ÖZDEN

2015-01-01

The purpose of this research is to examine the prospective elementary school teachers’ perceptions on socioscientific issues. The research was conducted on prospective elementary school teachers studying at a university located in western Turkey. The researcher first taught the subjects of global warming and nuclear power plants from a perspective of socioscientific issues in the science and technology education course and then conducted the research. Concurrent parallel design, one of the mi...
3D Body Scanning Measurement System Associated with RF Imaging, Zero-padding and Parallel Processing

Directory of Open Access Journals (Sweden)

Kim Hyung Tae

2016-04-01

Full Text Available This work presents a novel signal processing method for high-speed 3D body measurements using millimeter waves with a general processing unit (GPU and zero-padding fast Fourier transform (ZPFFT. The proposed measurement system consists of a radio-frequency (RF antenna array for a penetrable measurement, a high-speed analog-to-digital converter (ADC for significant data acquisition, and a general processing unit for fast signal processing. The RF waves of the transmitter and the receiver are converted to real and imaginary signals that are sampled by a high-speed ADC and synchronized with the kinematic positions of the scanner. Because the distance between the surface and the antenna is related to the peak frequency of the conjugate signals, a fast Fourier transform (FFT is applied to the signal processing after the sampling. The sampling time is finite owing to a short scanning time, and the physical resolution needs to be increased; further, zero-padding is applied to interpolate the spectra of the sampled signals to consider a 1/m floating point frequency. The GPU and parallel algorithm are applied to accelerate the speed of the ZPFFT because of the large number of additional mathematical operations of the ZPFFT. 3D body images are finally obtained by spectrograms that are the arrangement of the ZPFFT in a 3D space.
Parallel computing works!

CERN Document Server

Fox, Geoffrey C; Messina, Guiseppe C

2014-01-01

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

Multi-objective problem of the modified distributed parallel machine and assembly scheduling problem (MDPMASP) with eligibility constraints

Science.gov (United States)

Amallynda, I.; Santosa, B.

2017-11-01

This paper proposes a new generalization of the distributed parallel machine and assembly scheduling problem (DPMASP) with eligibility constraints referred to as the modified distributed parallel machine and assembly scheduling problem (MDPMASP) with eligibility constraints. Within this generalization, we assume that there are a set non-identical factories or production lines, each one with a set unrelated parallel machine with different speeds in processing them disposed to a single assembly machine in series. A set of different products that are manufactured through an assembly program of a set of components (jobs) according to the requested demand. Each product requires several kinds of jobs with different sizes. Beside that we also consider to the multi-objective problem (MOP) of minimizing mean flow time and the number of tardy products simultaneously. This is known to be NP-Hard problem, is important to practice, as the former criterions to reflect the customer's demand and manufacturer's perspective. This is a realistic and complex problem with wide range of possible solutions, we propose four simple heuristics and two metaheuristics to solve it. Various parameters of the proposed metaheuristic algorithms are discussed and calibrated by means of Taguchi technique. All proposed algorithms are tested by Matlab software. Our computational experiments indicate that the proposed problem and fourth proposed algorithms are able to be implemented and can be used to solve moderately-sized instances, and giving efficient solutions, which are close to optimum in most cases.
Parallel computing solution of Boltzmann neutron transport equation

International Nuclear Information System (INIS)

Ansah-Narh, T.

2010-01-01

The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
The Mediation of Teaching and Learning Processes through "Identity Artefacts." A Vygotskian Perspective

Science.gov (United States)

Subero, David; Llopart, Mariona; Siqués, Carina; Esteban-Guitart, Moises

2018-01-01

The aim of this paper is to address the teaching and learning processes in schools from a Vygotskian perspective based on the notion of "identity artefacts" (IAs) which, for our purposes, consist of documents created by the learners about themselves, in which they try to capture all the things that make sense and are meaningful to them…
THREE PERSPECTIVES ON MANAGING FRONT END INNOVATION

DEFF Research Database (Denmark)

Jensen, Anna Rose Vagn; Clausen, Christian; Gish, Liv

2018-01-01

as a complementary perspective. The paper combines a literature review with an empirical examination of the application of these multiple perspectives across three cases of front end of innovation (FEI) management in mature product developing companies. While the process models represent the dominant, albeit rather...... to represent an emergent approach in managing FEI where process models, knowledge strategies and objects become integrated elements in more advanced navigational strategies for key players.......This paper presents three complementary perspectives on the management of front end innovation: A process model perspective, a knowledge perspective and a translational perspective. While the first two perspectives are well established in literature, we offer the translation perspective...
A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 3

International Nuclear Information System (INIS)

Wiest, R.; Demuynck, J.; Benard, M.; Rohmer, M.M.; Ernenwein, R.

1991-01-01

This series of three papers presents a program system for ab initio molecular orbital calculations on vector and parallel computers. Part III is devoted to the four-index transformation on a molecular orbital basis of size NMO of the file of two-electorn integrals (pqparallelrs) generated by a contracted Gaussian set of size NATO (number of atomic orbitals). A fast Yoshimine algorithm first sorts the (pqparallelrs) integrals with respect to index pq only. This file of half-sorted integrals labelled by their rs-index can be processed without further modification to generate either the transformed integrals or the supermatrix elements. The large memory available on the CRAY-2 hase made possible to implement the transformation algorithm proposed by Bender in 1972, which requires a core-storage allocation varying as (NATO) 3 . Two versions of Bender's algorithm are included in the present program. The first version is an in-core version, where the complete file of accumulated contributions to transformed integrals in stored and updated in central memory. This version has been parallelized by distributing over a limited number of logical tasks the NATO steps corresponding to the scanning of the most external loop. The second version is an out-of-core version, in which twin files are alternatively used as input and output for the accumulated contributions to transformed integrals. This version is not parallel. The choice of one or another version and (for version 1) the determination of the number of tasks depends upon the balance between the available and the requested amounts of storage. The storage management and the choice of the proper version are carried out automatically using dynamic storage allocation. Both versions are vectorized and take advantage of the molecular symmetry. (orig.)
An object-oriented programming paradigm for parallelization of computational fluid dynamics

International Nuclear Information System (INIS)

Ohta, Takashi.

1997-03-01

We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Intelligent spatial ecosystem modeling using parallel processors

International Nuclear Information System (INIS)

Maxwell, T.; Costanza, R.

1993-01-01

Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change
The FORCE: A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
The FORCE - A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang

2015-01-01

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jörg

2015-08-06

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Design of Web Questionnaires : An Information Processing Perspective for the Effect of Response Categories

NARCIS (Netherlands)

Toepoel, V.; Vis, C.M.; Das, J.W.M.; van Soest, A.H.O.

2006-01-01

In this study we use an information-processing perspective to explore the impact of response scales on respondents answers in a web survey.This paper has four innovations compared to the existing literature: research is based on a different mode of administration (web), we use an open-ended format
PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

Directory of Open Access Journals (Sweden)

B. Kumar

2016-06-01

Full Text Available Extended morphological profile (EMP is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.
Constituent order and semantic parallelism in online comprehension: eye-tracking evidence from German.

Science.gov (United States)

Knoeferle, Pia; Crocker, Matthew W

2009-12-01

Reading times for the second conjunct of and-coordinated clauses are faster when the second conjunct parallels the first conjunct in its syntactic or semantic (animacy) structure than when its structure differs (Frazier, Munn, & Clifton, 2000; Frazier, Taft, Roeper, & Clifton, 1984). What remains unclear, however, is the time course of parallelism effects, their scope, and the kinds of linguistic information to which they are sensitive. Findings from the first two eye-tracking experiments revealed incremental constituent order parallelism across the board-both during structural disambiguation (Experiment 1) and in sentences with unambiguously case-marked constituent order (Experiment 2), as well as for both marked and unmarked constituent orders (Experiments 1 and 2). Findings from Experiment 3 revealed effects of both constituent order and subtle semantic (noun phrase similarity) parallelism. Together our findings provide evidence for an across-the-board account of parallelism for processing and-coordinated clauses, in which both constituent order and semantic aspects of representations contribute towards incremental parallelism effects. We discuss our findings in the context of existing findings on parallelism and priming, as well as mechanisms of sentence processing.
PARALLEL PROCESSING OF BIG POINT CLOUDS USING Z-ORDER-BASED PARTITIONING

Directory of Open Access Journals (Sweden)

C. Alis

2016-06-01

Full Text Available As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112 is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest
Algorithmically specialized parallel computers

CERN Document Server

Snyder, Lawrence; Gannon, Dennis B

1985-01-01

Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
Vector and parallel processors in computational science

International Nuclear Information System (INIS)

Duff, I.S.; Reid, J.K.

1985-01-01

This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)
Parallelization of a hydrological model using the message passing interface

Science.gov (United States)

Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

2013-01-01

With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
New Parallel Algorithms for Landscape Evolution Model

Science.gov (United States)

Jin, Y.; Zhang, H.; Shi, Y.

2017-12-01

Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
Partitions in languages and parallel computations

Energy Technology Data Exchange (ETDEWEB)

Burgin, M S; Burgina, E S

1982-05-01

Partitions of entries (linguistic structures) are studied that are intended for parallel data processing. The representations of formal languages with the aid of such structures is examined, and the relationships are considered between partitions of entries and abstract families of languages and automata. 18 references.

Parallel graded attention in reading: A pupillometric study

NARCIS (Netherlands)

Snell, Joshua; Mathot, Sebastiaan; Mirault, Jonathan; Grainger, Jonathan

2018-01-01

There are roughly two lines of theory to account for recent evidence that word processing is influenced by adjacent orthographic information. One line assumes that multiple words can be processed simultaneously through a parallel graded distribution of visuo-spatial attention. The other line assumes
Feed-forward volume rendering algorithm for moderately parallel MIMD machines

Science.gov (United States)

Yagel, Roni

1993-01-01

Algorithms for direct volume rendering on parallel and vector processors are investigated. Volumes are transformed efficiently on parallel processors by dividing the data into slices and beams of voxels. Equal sized sets of slices along one axis are distributed to processors. Parallelism is achieved at two levels. Because each slice can be transformed independently of others, processors transform their assigned slices with no communication, thus providing maximum possible parallelism at the first level. Within each slice, consecutive beams are incrementally transformed using coherency in the transformation computation. Also, coherency across slices can be exploited to further enhance performance. This coherency yields the second level of parallelism through the use of the vector processing or pipelining. Other ongoing efforts include investigations into image reconstruction techniques, load balancing strategies, and improving performance.
Implementation of a parallel version of a regional climate model

Energy Technology Data Exchange (ETDEWEB)

Gerstengarbe, F.W. [ed.; Kuecken, M. [Potsdam-Institut fuer Klimafolgenforschung (PIK), Potsdam (Germany); Schaettler, U. [Deutscher Wetterdienst, Offenbach am Main (Germany). Geschaeftsbereich Forschung und Entwicklung

1997-10-01

A regional climate model developed by the Max Planck Institute for Meterology and the German Climate Computing Centre in Hamburg based on the `Europa` and `Deutschland` models of the German Weather Service has been parallelized and implemented on the IBM RS/6000 SP computer system of the Potsdam Institute for Climate Impact Research including parallel input/output processing, the explicit Eulerian time-step, the semi-implicit corrections, the normal-mode initialization and the physical parameterizations of the German Weather Service. The implementation utilizes Fortran 90 and the Message Passing Interface. The parallelization strategy used is a 2D domain decomposition. This report describes the parallelization strategy, the parallel I/O organization, the influence of different domain decomposition approaches for static and dynamic load imbalances and first numerical results. (orig.)
Parallel worlds? French and anglophone perspectives on health geography.

Science.gov (United States)

Hoyez, Anne-Cécile; Collins, Damian; Fleuret, Sébastien

2016-09-01

In this short commentary, we address the following question: setting aside the issue of translation, do health geographers in France speak the same language as their English-speaking counterparts in various parts of the world? Specifically, do they have comparable empirical, theoretical and political concerns? We briefly survey the 'states of knowledge' in both fields for points of difference and similarity. We devote particular attention to the diverse contexts in which health geography is practiced. Our overarching goal is not to oppose two 'blocks'; nor is it to produce syntheses of two bodies of work. Rather, we seek to identify the diverse contexts in which geographic knowledge of health issues is produced, and to encourage reflection on what these mean for current and future collaboration across linguistic boundaries. We contend that meaningful Anglo-French comparative work will need to be particularly attentive to takes on theory that is 'the same, but different', to alternative spatial lenses (territory vs. place), and to sometimes sharply distinct perspectives on social difference. Copyright © 2016 Elsevier Ltd. All rights reserved.
A comparison of parallel dust and fibre measurements of airborne chrysotile asbestos in a large mine and processing factories in the Russian Federation

NARCIS (Netherlands)

Feletto, Eleonora; Schonfeld, Sara J; Kovalevskiy, Evgeny V; Bukhtiyarov, Igor V; Kashanskiy, Sergey V; Moissonnier, Monika; Straif, Kurt; Kromhout, Hans

2017-01-01

INTRODUCTION: Historic dust concentrations are available in a large-scale cohort study of workers in a chrysotile mine and processing factories in Asbest, Russian Federation. Parallel dust (gravimetric) and fibre (phase-contrast optical microscopy) concentrations collected in 1995, 2007 and 2013/14
A tool for simulating parallel branch-and-bound methods

Science.gov (United States)

Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

2016-01-01

The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Parallel Monte Carlo simulation of aerosol dynamics

KAUST Repository

Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.

2014-01-01

is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI
Modular and efficient ozone systems based on massively parallel chemical processing in microchannel plasma arrays: performance and commercialization

Science.gov (United States)

Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.

2017-08-01

Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.
A Parallel Algorithm for Connected Component Labelling of Gray-scale Images on Homogeneous Multicore Architectures

International Nuclear Information System (INIS)

Niknam, Mehdi; Thulasiraman, Parimala; Camorlinga, Sergio

2010-01-01

Connected component labelling is an essential step in image processing. We provide a parallel version of Suzuki's sequential connected component algorithm in order to speed up the labelling process. Also, we modify the algorithm to enable labelling gray-scale images. Due to the data dependencies in the algorithm we used a method similar to pipeline to exploit parallelism. The parallel algorithm method achieved a speedup of 2.5 for image size of 256 x 256 pixels using 4 processing threads.
Distributed and parallel approach for handle and perform huge datasets

Science.gov (United States)

Konopko, Joanna

2015-12-01

Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.
Fluid dynamics parallel computer development at NASA Langley Research Center

Science.gov (United States)

Townsend, James C.; Zang, Thomas A.; Dwoyer, Douglas L.

1987-01-01

To accomplish more detailed simulations of highly complex flows, such as the transition to turbulence, fluid dynamics research requires computers much more powerful than any available today. Only parallel processing on multiple-processor computers offers hope for achieving the required effective speeds. Looking ahead to the use of these machines, the fluid dynamicist faces three issues: algorithm development for near-term parallel computers, architecture development for future computer power increases, and assessment of possible advantages of special purpose designs. Two projects at NASA Langley address these issues. Software development and algorithm exploration is being done on the FLEX/32 Parallel Processing Research Computer. New architecture features are being explored in the special purpose hardware design of the Navier-Stokes Computer. These projects are complementary and are producing promising results.
Kinetic-Monte-Carlo-Based Parallel Evolution Simulation Algorithm of Dust Particles

Directory of Open Access Journals (Sweden)

Xiaomei Hu

2014-01-01

Full Text Available The evolution simulation of dust particles provides an important way to analyze the impact of dust on the environment. KMC-based parallel algorithm is proposed to simulate the evolution of dust particles. In the parallel evolution simulation algorithm of dust particles, data distribution way and communication optimizing strategy are raised to balance the load of every process and reduce the communication expense among processes. The experimental results show that the simulation of diffusion, sediment, and resuspension of dust particles in virtual campus is realized and the simulation time is shortened by parallel algorithm, which makes up for the shortage of serial computing and makes the simulation of large-scale virtual environment possible.
Parallel computing for event reconstruction in high-energy physics

International Nuclear Information System (INIS)

Wolbers, S.

1993-01-01

Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given
Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud

Directory of Open Access Journals (Sweden)

Xiaocheng Liu

2012-01-01

Full Text Available An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.
Parallel embedded systems: where real-time and low-power meet

DEFF Research Database (Denmark)

Karakehayov, Zdravko; Guo, Yu

2008-01-01

This paper introduces a combination of models and proofs for optimal power management via Dynamic Frequency Scaling and Dynamic Voltage Scaling. The approach is suitable for systems on a chip or microcontrollers where processors run in parallel with embedded peripherals. We have developed...... a software tool, called CASTLE, to provide computer assistance in the design process of energy-aware embedded systems. The tool considers single processor and parallel architectures. An example shows an energy reduction of 23% when the tool allocates two microcontrollers for parallel execution....
Stampi: a message passing library for distributed parallel computing. User's guide, second edition

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

2000-02-01

A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on the MPI2 specification, and it realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Main features of Stampi are summarized as follows: (i) an automatic switch function between external- and internal communications, (ii) a message routing/relaying with a routing module, (iii) a dynamic process creation, (iv) a support of two types of connection, Master/Slave and Client/Server, (v) a support of a communication with Java applets. Indeed vendors implemented MPI libraries as a closed system in one parallel machine or their systems, and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, moreover on eight kinds of parallel machines, totally fourteen systems. Stampi provides us MPI communication functionality on them. This report describes mainly the usage of Stampi. (author)
Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

NARCIS (Netherlands)

Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

2017-01-01

The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Mechanisms mediating parallel action monitoring in fronto-striatal circuits.

Science.gov (United States)

Beste, Christian; Ness, Vanessa; Lukas, Carsten; Hoffmann, Rainer; Stüwe, Sven; Falkenstein, Michael; Saft, Carsten

2012-08-01

Flexible response adaptation and the control of conflicting information play a pivotal role in daily life. Yet, little is known about the neuronal mechanisms mediating parallel control of these processes. We examined these mechanisms using a multi-methodological approach that integrated data from event-related potentials (ERPs) with structural MRI data and source localisation using sLORETA. Moreover, we calculated evoked wavelet oscillations. We applied this multi-methodological approach in healthy subjects and patients in a prodromal phase of a major basal ganglia disorder (i.e., Huntington's disease), to directly focus on fronto-striatal networks. Behavioural data indicated, especially the parallel execution of conflict monitoring and flexible response adaptation was modulated across the examined cohorts. When both processes do not co-incide a high integrity of fronto-striatal loops seems to be dispensable. The neurophysiological data suggests that conflict monitoring (reflected by the N2 ERP) and working memory processes (reflected by the P3 ERP) differentially contribute to this pattern of results. Flexible response adaptation under the constraint of high conflict processing affected the N2 and P3 ERP, as well as their delta frequency band oscillations. Yet, modulatory effects were strongest for the N2 ERP and evoked wavelet oscillations in this time range. The N2 ERPs were localized in the anterior cingulate cortex (BA32, BA24). Modulations of the P3 ERP were localized in parietal areas (BA7). In addition, MRI-determined caudate head volume predicted modulations in conflict monitoring, but not working memory processes. The results show how parallel conflict monitoring and flexible adaptation of action is mediated via fronto-striatal networks. While both, response monitoring and working memory processes seem to play a role, especially response selection processes and ACC-basal ganglia networks seem to be the driving force in mediating parallel conflict

Fast robot kinematics modeling by using a parallel simulator (PSIM)

International Nuclear Information System (INIS)

El-Gazzar, H.M.; Ayad, N.M.A.

2002-01-01

High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done
Fast robot kinematics modeling by using a parallel simulator (PSIM)

Energy Technology Data Exchange (ETDEWEB)

El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)

2002-09-15

High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.
Optimization of the parameter calculation the process of production historic by using Parallel Virtual Machine-PVM; Otimizacao do calculo de parametros no processo de ajuste de historicos de producao usando PVM

Energy Technology Data Exchange (ETDEWEB)

Vargas Cuervo, Carlos Hernan

1997-03-01

The main objective of this work is to develop a methodology to optimize the simultaneous computation of two parameters in the process of production history matching. This work describes a procedure to minimize an objective function established to find the values of the parameters which are modified in the process. The parameters are chosen after a sensibility analysis. Two optimization methods are tested: a Region Search Method (MBR) and Polytope Method. Both are based in direct search methods which do not require the function derivative. The software PVM (Parallel Virtual Machine) is used to parallelize the simulation runs, allowing the acceleration of the process and the search of multiple solutions. The validation of the methodology is applied to two reservoir models: one homogeneous and other heterogeneous. The advantages of each method and of the parallelization are also present. (author)
A tomograph VMEbus parallel processing data acquisition system

International Nuclear Information System (INIS)

Wilkinson, N.A.; Rogers, J.G.; Atkins, M.S.

1989-01-01

This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. the data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition system. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events which are 64 bits wide
Benefits of Parallel I/O in Ab Initio Nuclear Physics Calculations

International Nuclear Information System (INIS)

Laghave, Nikhil; Sosonkina, Masha; Maris, Pieter; Vary, James P.

2009-01-01

Many modern scientific applications rely on highly parallel calculations, which scale to 10's of thousands processors. However, most applications do not concentrate on parallelizing input/output operations. In particular, sequential I/O has been identified as a bottleneck for the highly scalable MFDn (Many Fermion Dynamics for nuclear structure) code performing ab initio nuclear structure calculations. In this paper, we develop interfaces and parallel I/O procedures to use a well-known parallel I/O library in MFDn. As a result, we gain efficient input/output of large datasets along with their portability and ease of use in the downstream processing.
Parallel algorithms

CERN Document Server

Casanova, Henri; Robert, Yves

2008-01-01

""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Relationships between music training, speech processing, and word learning: a network perspective.

Science.gov (United States)

Elmer, Stefan; Jäncke, Lutz

2018-03-15

Numerous studies have documented the behavioral advantages conferred on professional musicians and children undergoing music training in processing speech sounds varying in the spectral and temporal dimensions. These beneficial effects have previously often been associated with local functional and structural changes in the auditory cortex (AC). However, this perspective is oversimplified, in that it does not take into account the intrinsic organization of the human brain, namely, neural networks and oscillatory dynamics. Therefore, we propose a new framework for extending these previous findings to a network perspective by integrating multimodal imaging, electrophysiology, and neural oscillations. In particular, we provide concrete examples of how functional and structural connectivity can be used to model simple neural circuits exerting a modulatory influence on AC activity. In addition, we describe how such a network approach can be used for better comprehending the beneficial effects of music training on more complex speech functions, such as word learning. © 2018 New York Academy of Sciences.
Vectorization, parallelization and porting of nuclear codes. 2001

International Nuclear Information System (INIS)

Akiyama, Mitsunaga; Katakura, Fumishige; Kume, Etsuo; Nemoto, Toshiyuki; Tsuruoka, Takuya; Adachi, Masaaki

2003-07-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the super computer system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 10 codes in fiscal 2001. In this report, the parallelization of Neutron Radiography for 3 Dimensional CT code NR3DCT, the vectorization of unsteady-state heat conduction code THERMO3D, the porting of initial program of MHD simulation, the tuning of Heat And Mass Balance Analysis Code HAMBAC, the porting and parallelization of Monte Carlo N-Particle transport code MCNP4C3, the porting and parallelization of Monte Carlo N-Particle transport code system MCNPX2.1.5, the porting of induced activity calculation code CINAC-V4, the use of VisLink library in multidimensional two-fluid model code ACD3D and the porting of experiment data processing code from GS8500 to SR8000 are described. (author)
Configuration affects parallel stent grafting results.

Science.gov (United States)

Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

2018-05-01

A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31
Parallel and Serial Grouping of Image Elements in Visual Perception

Science.gov (United States)

Houtkamp, Roos; Roelfsema, Pieter R.

2010-01-01

The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some…
A tool for simulating parallel branch-and-bound methods

Directory of Open Access Journals (Sweden)

Golubeva Yana

2016-01-01

Full Text Available The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer’s interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.

Science.gov (United States)

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
History and future perspectives of the Monte Carlo shell model -from Alphleet to K computer-

International Nuclear Information System (INIS)

Shimizu, Noritaka; Otsuka, Takaharu; Utsuno, Yutaka; Mizusaki, Takahiro; Honma, Michio; Abe, Takashi

2013-01-01

We report a history of the developments of the Monte Carlo shell model (MCSM). The MCSM was proposed in order to perform large-scale shell-model calculations which direct diagonalization method cannot reach. Since 1999 PC clusters were introduced for parallel computation of the MCSM. Since 2011 we participated the High Performance Computing Infrastructure Strategic Program and developed a new MCSM code for current massively parallel computers such as K computer. We discuss future perspectives concerning a new framework and parallel computation of the MCSM by incorporating conjugate gradient method and energy-variance extrapolation
Development and control towards a parallel water hydraulic weld/cut robot for machining processes in ITER vacuum vessel

International Nuclear Information System (INIS)

Wu Huapeng; Handroos, Heikki; Pessi, Pekka; Kilkki, Juha; Jones, Lawrence

2005-01-01

This paper presents a special robot, able to carry out welding and machining processes from inside the ITER vacuum vessel (VV), consisting of a five degree-of-freedom parallel mechanism, mounted on a carriage driven by two electric motors on a rack. The kinematic design of the robot has been optimised for ITER access and a hydraulically actuated pre-prototype built. A hybrid controller is designed for the robot, including position, speed and pressure feedback loops to achieve high accuracy and high dynamic performances. Finally, the experimental tests are given and discussed
Neural nets for massively parallel optimization

Science.gov (United States)

Dixon, Laurence C. W.; Mills, David

1992-07-01

To apply massively parallel processing systems to the solution of large scale optimization problems it is desirable to be able to evaluate any function f(z), z (epsilon) Rn in a parallel manner. The theorem of Cybenko, Hecht Nielsen, Hornik, Stinchcombe and White, and Funahasi shows that this can be achieved by a neural network with one hidden layer. In this paper we address the problem of the number of nodes required in the layer to achieve a given accuracy in the function and gradient values at all points within a given n dimensional interval. The type of activation function needed to obtain nonsingular Hessian matrices is described and a strategy for obtaining accurate minimal networks presented.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.

Science.gov (United States)

Tao, Liang; Kwan, Hon Keung

2012-07-01

Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

Energy Technology Data Exchange (ETDEWEB)

Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

Science.gov (United States)

Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

2011-01-01

Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.
Mapping brain structure and function: cellular resolution, global perspective.

Science.gov (United States)

Zupanc, Günther K H

2017-04-01

A comprehensive understanding of the brain requires analysis, although from a global perspective, with cellular, and even subcellular, resolution. An important step towards this goal involves the establishment of three-dimensional high-resolution brain maps, incorporating brain-wide information about the cells and their connections, as well as the chemical architecture. The progress made in such anatomical brain mapping in recent years has been paralleled by the development of physiological techniques that enable investigators to generate global neural activity maps, also with cellular resolution, while simultaneously recording the organism's behavioral activity. Combination of the high-resolution anatomical and physiological maps, followed by theoretical systems analysis of the deduced network, will offer unprecedented opportunities for a better understanding of how the brain, as a whole, processes sensory information and generates behavior.
Final Report: Migration Mechanisms for Large-scale Parallel Applications

Energy Technology Data Exchange (ETDEWEB)

Jason Nieh

2009-10-30

Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous

Some links on this page may take you to non-federal websites. Their policies may differ from this site.