Vera, I.
The National Electric Company of Venezuela, C.A.D.A.F.E., is sponsoring the development of this experiment which represents Venezuela's first scientific experiment in space. The apparatus for the automatic casting of polymer thin films will be contained in NASA's payload No. G-559 of the Get Away Special program for a future orbital space flight in the U.S. Space Shuttle. Semi-permeable polymer membranes have important applications in a variety of fields, such as medecine, energy, and pharmaceuticals, and in general fluid separation processes such as reverse osmosis, ultra-filtration, and electro-dialysis. The casting of semi-permeable membranes in space will help to identify the roles of convection in determining the strucutre of these membranes.
Treatment of radioactive liquid wastes on semi-permeable membranes
Antonescu, M.; Deleanu, N.; Nechifor, G.
1997-01-01
At present, among the currently world-wide applied separation processes, those using membranes are thought to be most advanced due to their advantages: high efficiency, cost-effectiveness in application, universality of the utilized equipment, operation in non-destructive and non-polluting conditions. The most significant results of the treatment experiments are: - a reduction of more than 70% in the chemical oxygen consumption for the solution simulating the POD waste; - the solution simulating the secondary waste from decontamination by POD procedure, appear to be the best (with retentions of 88.5%, 76.5% and 65.7% for strontium, cobalt and manganese, respectively). Important reduction of costs and efficient technological schemes can be obtained by combining the semi-permeable membrane separation techniques with other efficient currently used procedures of separation, concentration and purification, adequate for given situations
Modelling the effects of porous and semi-permeable layers on corrosion processes
King, F.; Kolar, M.; Shoesmith, D.W.
1996-09-01
Porous and semi-permeable layers play a role in many corrosion processes. Porous layers may simply affect the rate of corrosion by affecting the rate of mass transport of reactants and products to and from the corroding surface. Semi-permeable layers can further affect the corrosion process by reacting with products and/or reactants. Reactions in semi-permeable layers include redox processes involving electron transfer, adsorption, ion-exchange and complexation reactions and precipitation/dissolution processes. Examples of porous and semi-permeable layers include non-reactive salt films, precipitate layers consisting of redox-active species in multiple oxidation states (e.g., Fe oxide films), clay and soil layers and biofilms. Examples of these various types of processes will be discussed and modelling techniques developed from studies for the disposal of high-level nuclear waste presented. (author). 48 refs., 1 tab., 12 figs
A framework for understanding semi-permeable barrier effects on migratory ungulates
Sawyer, Hall; Kauffman, Matthew J.; Middleton, Arthur D.; Morrison, Thomas A.; Nielson, Ryan M.; Wyckoff, Teal B.
2013-01-01
1. Impermeable barriers to migration can greatly constrain the set of possible routes and ranges used by migrating animals. For ungulates, however, many forms of development are semi-permeable, and making informed management decisions about their potential impacts to the persistence of migration routes is difficult because our knowledge of how semi-permeable barriers affect migratory behaviour and function is limited. 2. Here, we propose a general framework to advance the understanding of barrier effects on ungulate migration by emphasizing the need to (i) quantify potential barriers in terms that allow behavioural thresholds to be considered, (ii) identify and measure behavioural responses to semi-permeable barriers and (iii) consider the functional attributes of the migratory landscape (e.g. stopovers) and how the benefits of migration might be reduced by behavioural changes. 3. We used global position system (GPS) data collected from two subpopulations of mule deer Odocoileus hemionus to evaluate how different levels of gas development influenced migratory behaviour, including movement rates and stopover use at the individual level, and intensity of use and width of migration route at the population level. We then characterized the functional landscape of migration routes as either stopover habitat or movement corridors and examined how the observed behavioural changes affected the functionality of the migration route in terms of stopover use. 4. We found migratory behaviour to vary with development intensity. Our results suggest that mule deer can migrate through moderate levels of development without any noticeable effects on migratory behaviour. However, in areas with more intensive development, animals often detoured from established routes, increased their rate of movement and reduced stopover use, while the overall use and width of migration routes decreased. 5. Synthesis and applications. In contrast to impermeable barriers that impede animal movement
Levy, W.; Henkelmann, B.; Pfister, G.; Bernhoeft, S.; Kirchner, M.; Jakobi, G. [Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Ingolstaedter Landstrasse 1, D-85764 Neuherberg (Germany); Bassan, R. [Regional Agency for Environmental Prevention and Protection of Veneto, Via Matteotti 27, 35137 Padova (Italy); Kraeuchi, N. [WSL-Swiss Federal Institute for Forest, Snow and Landscape Research, Zuercherstrasse 111, CH-8903 Birmensdorf (Switzerland); Schramm, K.-W., E-mail: schramm@helmholtz-muenchen.d [Helmholtz Zentrum Muenchen, German Research Center for Environmental Health, Ingolstaedter Landstrasse 1, D-85764 Neuherberg (Germany); TUM-Technische Universitaet Muenchen, Department fuer Biowissenschaftliche Grundlagen Weihenstephaner Steig 23, D-85350 Freising (Germany)
2009-12-15
Atmospheric sampling of organochlorine pesticides (OCPs) was conducted using Semi Permeable Membrane Devices (SPMDs) deployed in the Alps at different altitudinal transects for two consecutive exposure periods of half a year and a third simultaneous year-long period. Along all the altitude profiles, the sequestered amounts of OCPs increased in general with altitude. SPMDs were still working as kinetic samplers after half a year for the majority of the OCPs. However, compounds with the lowest octanol-air partition coefficient (K{sub oa}), reached equilibrium within six months. This change in the SPMD uptake was determined for the temperature gradient along the altitude profile influencing K{sub oa}, OCPs availability in the gaseous phase, and SPMD performance. In sum, it seems two effects are working in parallel along the altitude profiles: the change in SPMD performance and the different availability of OCPs along the altitudinal transects determined by their compound properties and concentrations in air. - SPMDs were in different uptake stages regarding OCPs, as they were influenced by the temperature (season, triolein state, and altitude) and K{sub oa}.
Portable, parallel, reusable Krylov space codes
Smith, B.; Gropp, W. [Argonne National Lab., IL (United States)
1994-12-31
Krylov space accelerators are an important component of many algorithms for the iterative solution of linear systems. Each Krylov space method has it`s own particular advantages and disadvantages, therefore it is desirable to have a variety of them available all with an identical, easy to use, interface. A common complaint application programmers have with available software libraries for the iterative solution of linear systems is that they require the programmer to use the data structures provided by the library. The library is not able to work with the data structures of the application code. Hence, application programmers find themselves constantly recoding the Krlov space algorithms. The Krylov space package (KSP) is a data-structure-neutral implementation of a variety of Krylov space methods including preconditioned conjugate gradient, GMRES, BiCG-Stab, transpose free QMR and CGS. Unlike all other software libraries for linear systems that the authors are aware of, KSP will work with any application codes data structures, in Fortran or C. Due to it`s data-structure-neutral design KSP runs unchanged on both sequential and parallel machines. KSP has been tested on workstations, the Intel i860 and Paragon, Thinking Machines CM-5 and the IBM SP1.
Fernández-Castro, M; Martín-Gil, B; Peña-García, I; López-Vallecillo, M; García-Puig, M E
2017-11-01
The aim of this systematic review is to assess the available evidence concerning the effectiveness of semi-permeable dressings, on the full range of skin reactions, related to radiation therapy in cancer patients, from local erythema to moist desquamation, including subjective symptoms such as pain, discomfort, itchiness, burning and the effect on daily life activities. The bibliographic search was carried out looking for Randomised Clinical Trials (RCTs) indexed in PubMed, Cinhal, Cochrane plus and Biblioteca Nacional de Salud, published in the English and Spanish language, between 2010 and 2015. Data extraction and evaluation of study quality was undertaken by peer reviewers using the Critical Appraisal Skills Programme (CASP). Of 181 studies, nine full texts were assessed. Finally, six RCT were included in the final synthesis: three analysed the application of Mepilex ® Lite in breast cancer and head & neck cancer; one evaluated the application of Mepitel ® Film in breast cancer; and two assessed the use of silver nylon dressings in breast cancer and in patients with lower gastrointestinal cancer. The results show that semi-permeable dressings are beneficial in the management of skin toxicity related to radiation therapy. However, rigorous trials showing stronger evidence are needed. © 2017 John Wiley & Sons Ltd.
Positive zeta potential of a negatively charged semi-permeable plasma membrane
Sinha, Shayandev; Jing, Haoyuan; Das, Siddhartha
2017-08-01
The negative charge of the plasma membrane (PM) severely affects the nature of moieties that may enter or leave the cells and controls a large number of ion-interaction-mediated intracellular and extracellular events. In this letter, we report our discovery of a most fascinating scenario, where one interface (e.g., membrane-cytosol interface) of the negatively charged PM shows a positive surface (or ζ) potential, while the other interface (e.g., membrane-electrolyte interface) still shows a negative ζ potential. Therefore, we encounter a completely unexpected situation where an interface (e.g., membrane-cytosol interface) that has a negative surface charge density demonstrates a positive ζ potential. We establish that the attainment of such a property by the membrane can be ascribed to an interplay of the nature of the membrane semi-permeability and the electrostatics of the electric double layer established on either side of the charged membrane. We anticipate that such a membrane property can lead to such capabilities of the cell (in terms of accepting or releasing certain kinds of moieties as well regulating cellular signaling) that was hitherto inconceivable.
Effect of semi-permeable cover system on the bacterial diversity during sewage sludge composting.
Robledo-Mahón, Tatiana; Aranda, Elisabet; Pesciaroli, Chiara; Rodríguez-Calvo, Alfonso; Silva-Castro, Gloria Andrea; González-López, Jesús; Calvo, Concepción
2018-06-01
Sewage sludge composting is a profitable process economically viable and environmentally friendly. In despite of there are several kind of composting types, the use of combined system of semipermeable cover film and aeration air-floor is widely developed at industrial scale. However, the knowledge of the linkages between microbial communities structure, enzyme activities and physico-chemical factors under these conditions it has been poorly explored. Thus, the aim of this study was to investigate the bacterial dynamic and community structure using next generation sequencing coupled to analyses of microbial enzymatic activity and culturable dependent techniques in a full-scale real composting plant. Sewage sludge composting process was conducted using a semi-permeable Gore-tex cover, in combination with an air-insufflation system. The highest values of enzymatic activities such as dehydrogenase, protease and arylsulphatase were detected in the first 5 days of composting; suggesting that during this period of time a greater degrading activity of organic matter took place. Culturable bacteria identified were in agreement with the bacteria found by massive sequencing technologies. The greatest bacterial diversity was detected between days 15 and 30, with Actinomycetales and Bacillales being the predominant orders at the beginning and end of the process. Bacillus was the most representative genus during all the process. A strong correlation between abiotic factors as total organic content and organic matter and enzymatic activities such as dehydrogenase, alkaline phosphatase, and ß-glucosidase activity was found. Bacterial diversity was strongly influenced by the stage of the process, community-structure change was concomitant with a temperature rise, rendering favorable conditions to stimulate microbial activity and facilitate the change in the microbial community linked to the degradation process. Moreover, results obtained confirmed that the use of semipermeable
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems
Kolev, Tzanio V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2012-12-18
We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
King, F.
1987-01-01
A technique to investigate the mechanism of uniform corrosion in the presence of a semi-permeable membrane is described. For both the anodic and cathodic half-reactions three possible rate-determining steps are considered: transport of species through the bulk solution diffusion layer, transport of species through the membrane and the electrochemical reaction itself. The technique is based on the measurement of the corrosion potential, E CORR , of a rotating disc electrode under steady-state conditions. The variation of E CORR with the oxidant concentration, the thickness of the diffusion layer and the membrane thickness is used to identify the rate-determining step for each half-reaction. This technique should be of use in the study of the corrosion behaviour of candidate materials for nuclear waste disposal containers. An understanding of the mechanism of uniform corrosion will enable confident predictions to be made concerning the long-term behaviour of such containers
Adaptive integrand decomposition in parallel and orthogonal space
Mastrolia, Pierpaolo; Peraro, Tiziano; Primo, Amedeo
2016-01-01
We present the integrand decomposition of multiloop scattering amplitudes in parallel and orthogonal space-time dimensions, d=d ∥ +d ⊥ , being d ∥ the dimension of the parallel space spanned by the legs of the diagrams. When the number n of external legs is n≤4, the corresponding representation of multiloop integrals exposes a subset of integration variables which can be easily integrated away by means of Gegenbauer polynomials orthogonality condition. By decomposing the integration momenta along parallel and orthogonal directions, the polynomial division algorithm is drastically simplified. Moreover, the orthogonality conditions of Gegenbauer polynomials can be suitably applied to integrate the decomposed integrand, yielding the systematic annihilation of spurious terms. Consequently, multiloop amplitudes are expressed in terms of integrals corresponding to irreducible scalar products of loop momenta and external ones. We revisit the one-loop decomposition, which turns out to be controlled by the maximum-cut theorem in different dimensions, and we discuss the integrand reduction of two-loop planar and non-planar integrals up to n=8 legs, for arbitrary external and internal kinematics. The proposed algorithm extends to all orders in perturbation theory.
Adaptive integrand decomposition in parallel and orthogonal space
Mastrolia, Pierpaolo [Dipartimento di Fisica ed Astronomia, Università di Padova,Via Marzolo 8, 35131 Padova (Italy); INFN, Sezione di Padova,Via Marzolo 8, 35131 Padova (Italy); Peraro, Tiziano [Higgs Centre for Theoretical Physics, School of Physics and Astronomy,The University of Edinburgh,James Clerk Maxwell Building,Peter Guthrie Tait Road, Edinburgh EH9 3FD, Scotland (United Kingdom); Primo, Amedeo [Dipartimento di Fisica ed Astronomia, Università di Padova,Via Marzolo 8, 35131 Padova (Italy); INFN, Sezione di Padova,Via Marzolo 8, 35131 Padova (Italy)
2016-08-29
We present the integrand decomposition of multiloop scattering amplitudes in parallel and orthogonal space-time dimensions, d=d{sub ∥}+d{sub ⊥}, being d{sub ∥} the dimension of the parallel space spanned by the legs of the diagrams. When the number n of external legs is n≤4, the corresponding representation of multiloop integrals exposes a subset of integration variables which can be easily integrated away by means of Gegenbauer polynomials orthogonality condition. By decomposing the integration momenta along parallel and orthogonal directions, the polynomial division algorithm is drastically simplified. Moreover, the orthogonality conditions of Gegenbauer polynomials can be suitably applied to integrate the decomposed integrand, yielding the systematic annihilation of spurious terms. Consequently, multiloop amplitudes are expressed in terms of integrals corresponding to irreducible scalar products of loop momenta and external ones. We revisit the one-loop decomposition, which turns out to be controlled by the maximum-cut theorem in different dimensions, and we discuss the integrand reduction of two-loop planar and non-planar integrals up to n=8 legs, for arbitrary external and internal kinematics. The proposed algorithm extends to all orders in perturbation theory.
Airborne Precision Spacing (APS) Dependent Parallel Arrivals (DPA)
Smith, Colin L.
2012-01-01
The Airborne Precision Spacing (APS) team at the NASA Langley Research Center (LaRC) has been developing a concept of operations to extend the current APS concept to support dependent approaches to parallel or converging runways along with the required pilot and controller procedures and pilot interfaces. A staggered operations capability for the Airborne Spacing for Terminal Arrival Routes (ASTAR) tool was developed and designated as ASTAR10. ASTAR10 has reached a sufficient level of maturity to be validated and tested through a fast-time simulation. The purpose of the experiment was to identify and resolve any remaining issues in the ASTAR10 algorithm, as well as put the concept of operations through a practical test.
Parallelization of the Physical-Space Statistical Analysis System (PSAS)
Larson, J. W.; Guo, J.; Lyster, P. M.
1999-01-01
Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational
An Implementation and Parallelization of the Scale Space Meshing Algorithm
Julie Digne
2015-11-01
Full Text Available Creating an interpolating mesh from an unorganized set of oriented points is a difficult problemwhich is often overlooked. Most methods focus indeed on building a watertight smoothed meshby defining some function whose zero level set is the surface of the object. However in some casesit is crucial to build a mesh that interpolates the points and does not fill the acquisition holes:either because the data are sparse and trying to fill the holes would create spurious artifactsor because the goal is to explore visually the data exactly as they were acquired without anysmoothing process. In this paper we detail a parallel implementation of the Scale-Space Meshingalgorithm, which builds on the scale-space framework for reconstructing a high precision meshfrom an input oriented point set. This algorithm first smoothes the point set, producing asingularity free shape. It then uses a standard mesh reconstruction technique, the Ball PivotingAlgorithm, to build a mesh from the smoothed point set. The final step consists in back-projecting the mesh built on the smoothed positions onto the original point set. The result ofthis process is an interpolating, hole-preserving surface mesh reconstruction.
Ma, Shuangshuang; Fang, Chen; Sun, Xiaoxi; Han, Lujia; He, Xueqin; Huang, Guangqun
2018-07-01
Bacteria play an important role in organic matter degradation and maturity during aerobic composting. This study analyzed composting with or without a membrane cover in laboratory-scale aerobic composting reactor systems. 16S rRNA gene analysis was used to study the bacterial community succession during composting. The richness of the bacterial community decreased and the diversity increased after covering with a semi-permeable membrane and applying a slight positive pressure. Principal components analysis based on operational taxonomic units could distinguish the main composting phases. Linear Discriminant Analysis Effect Size analysis indicated that covering with a semi-permeable membrane reduced the relative abundance of anaerobic Clostridiales and pathogenic Pseudomonas and increased the abundance of Cellvibrionales. In membrane-covered aerobic composting systems, the relative abundance of some bacteria could be affected, especially anaerobic bacteria. Covering could effectively promote fermentation, reduce emissions and ensure organic fertilizer quality. Copyright © 2018 Elsevier Ltd. All rights reserved.
Esteve-Turrillas, Francesc A.; Pastor, Agustin; Guardia, Miguel de la
2006-01-01
A rapid and environmentally friendly methodology was developed for the extraction of pyrethroid insecticides from semi permeable membrane devices (SPMDs), in which they were preconcentrated in gas phase. The method was based on gas chromatography mass-mass spectrometry determination after a microwave-assisted extraction, in front of the widely employed dialysis method. SPMDs were extracted twice with 30 mL hexane:acetone, irradiated with 250 W power output, until 90 deg. C in 10 min, this temperature being held for another 10 min. Clean-up of the extracts was performed by acetonitrile-hexane partitioning and solid-phase extraction (SPE) with a combined cartridge of 2 g basic-alumina, deactivated with 5% water, and 500 mg C 18 . Pyrethroids investigated were Allethrin, Prallethrin, Tetramethrin, Bifenthrin, Phenothrin, λ-Cyhalothrin, Permethrin, Cyfluthrin, Cypermethrin, Flucythrinate, Esfenvalerate, Fluvalinate and Deltamethrin. The main pyrethroid synergist compound, Pyperonyl Butoxide, was also studied. Limit of detection values ranging from 0.3 to 0.9 ng/SPMD and repeatability data, as relative standard deviation, from 2.9 to 9.4%, were achieved. Pyrethroid recoveries, for spiked SPMDs, with 100 ng of each one of the pyrethroids evaluated, were from 61 ± 8 to 103 ± 7% for microwave-assisted extraction, versus 54 ± 4 to 104 ± 3% for dialysis reference method. Substantial reduction of solvent consumed (from 400 to 60 mL) and analysis time (from 48 to 1 h) was achieved by using the developed procedure. High concentration levels of pyrethroid compounds, from 0.14 to 7.3 μg/SPMD, were found in indoor air after 2 h of a standard application
Esteve-Turrillas, Francesc A. [Analytical Chemistry Department, University of Valencia, Edifici Jeroni Munoz, 50th Dr. Moliner, 46100 Burjassot, Valencia (Spain); Pastor, Agustin [Analytical Chemistry Department, University of Valencia, Edifici Jeroni Munoz, 50th Dr. Moliner, 46100 Burjassot, Valencia (Spain)]. E-mail: agustin.pastor@uv.es; Guardia, Miguel de la [Analytical Chemistry Department, University of Valencia, Edifici Jeroni Munoz, 50th Dr. Moliner, 46100 Burjassot, Valencia (Spain)
2006-02-23
A rapid and environmentally friendly methodology was developed for the extraction of pyrethroid insecticides from semi permeable membrane devices (SPMDs), in which they were preconcentrated in gas phase. The method was based on gas chromatography mass-mass spectrometry determination after a microwave-assisted extraction, in front of the widely employed dialysis method. SPMDs were extracted twice with 30 mL hexane:acetone, irradiated with 250 W power output, until 90 deg. C in 10 min, this temperature being held for another 10 min. Clean-up of the extracts was performed by acetonitrile-hexane partitioning and solid-phase extraction (SPE) with a combined cartridge of 2 g basic-alumina, deactivated with 5% water, and 500 mg C{sub 18}. Pyrethroids investigated were Allethrin, Prallethrin, Tetramethrin, Bifenthrin, Phenothrin, {lambda}-Cyhalothrin, Permethrin, Cyfluthrin, Cypermethrin, Flucythrinate, Esfenvalerate, Fluvalinate and Deltamethrin. The main pyrethroid synergist compound, Pyperonyl Butoxide, was also studied. Limit of detection values ranging from 0.3 to 0.9 ng/SPMD and repeatability data, as relative standard deviation, from 2.9 to 9.4%, were achieved. Pyrethroid recoveries, for spiked SPMDs, with 100 ng of each one of the pyrethroids evaluated, were from 61 {+-} 8 to 103 {+-} 7% for microwave-assisted extraction, versus 54 {+-} 4 to 104 {+-} 3% for dialysis reference method. Substantial reduction of solvent consumed (from 400 to 60 mL) and analysis time (from 48 to 1 h) was achieved by using the developed procedure. High concentration levels of pyrethroid compounds, from 0.14 to 7.3 {mu}g/SPMD, were found in indoor air after 2 h of a standard application.
Biharmonic Submanifolds with Parallel Mean Curvature Vector in Pseudo-Euclidean Spaces
Fu, Yu, E-mail: yufudufe@gmail.com [Dongbei University of Finance and Economics, School of Mathematics and Quantitative Economics (China)
2013-12-15
In this paper, we investigate biharmonic submanifolds in pseudo-Euclidean spaces with arbitrary index and dimension. We give a complete classification of biharmonic spacelike submanifolds with parallel mean curvature vector in pseudo-Euclidean spaces. We also determine all biharmonic Lorentzian surfaces with parallel mean curvature vector field in pseudo-Euclidean spaces.
Biharmonic Submanifolds with Parallel Mean Curvature Vector in Pseudo-Euclidean Spaces
Fu, Yu
2013-01-01
In this paper, we investigate biharmonic submanifolds in pseudo-Euclidean spaces with arbitrary index and dimension. We give a complete classification of biharmonic spacelike submanifolds with parallel mean curvature vector in pseudo-Euclidean spaces. We also determine all biharmonic Lorentzian surfaces with parallel mean curvature vector field in pseudo-Euclidean spaces
Breaux, A.; Kolker, A.; Telfeyan, K.; Kim, J.; Johannesson, K. H.; Cable, J. E.
2014-12-01
Many studies have focused on hydrological and geochemical fluxes to the ocean from land to the ocean via submarine groundwater discharge (SGD), however few have assessed these contributions of SGD in deltaic settings. The Mississippi River delta is the largest delta in North America, and the magnitude of groundwater that discharges from the river into its delta is relatively unknown. Hydrological budgets indicate that there is a large magnitude of surface water lost in the Mississippi's delta as the river flows into the Gulf of Mexico. Recent evidence in our study indicates that paleochannels, or semi-permeable buried sandy bodies that were former distributaries of the river, allow for water to discharge out of the Mississippi's main channel and into its delta driven by a difference in hydraulic head between the river and the lower lying coastal embayments. Our study uses geophysical data, including sonar and resistivity methods, to detect the location of these paleochannels in Barataria Bay, a coastal bay located in the Mississippi Delta. High resolution CHIRP sonar data shows that these paleochannel features are ubiquitous in the Mississippi Delta, whereas resistivity data indicates that lower salinity water is found during high river flow in bays proximate to the river. Sediment core analysis is also used to characterize the area of study, as well as further understand the regional geology of the Mississippi Delta and estimate values of permeability and hydraulic conductivity of sediments taken from two locations in Barataria Bay. The geophysical and sediment core data will likewise be used to contextualize geochemical data collected in the field, which includes an assessment of major cations and anions, as well as in situ Rn-222 activities, a method that has been proven to be useful as a tracer of groundwater movement. The results may be useful in understanding the potential global magnitude of hydrological and geochemical fluxes of other large rivers with
Marco Villani
2013-09-01
Full Text Available In this work we introduce some preliminary analyses on the role of a semi-permeable membrane in the dynamics of a stochastic model of catalytic reaction sets (CRSs of molecules. The results of the simulations performed on ensembles of randomly generated reaction schemes highlight remarkable differences between this very simple protocell description model and the classical case of the continuous stirred-tank reactor (CSTR. In particular, in the CSTR case, distinct simulations with the same reaction scheme reach the same dynamical equilibrium, whereas, in the protocell case, simulations with identical reaction schemes can reach very different dynamical states, despite starting from the same initial conditions.
Thermally optimum spacing of vertical, natural convection cooled, parallel plates
Bar-Cohen, A.; Rohsenow, W. M.
Vertical two-dimensional channels formed by parallel plates or fins are a frequently encountered configuration in natural convection cooling in air of electronic equipment. In connection with the complexity of heat dissipation in vertical parallel plate arrays, little theoretical effort is devoted to thermal optimization of the relevant packaging configurations. The present investigation is concerned with the establishment of an analytical structure for analyses of such arrays, giving attention to useful relations for heat distribution patterns. The limiting relations for fully-developed laminar flow, in a symmetric isothermal or isoflux channel as well as in a channel with an insulated wall, are derived by use of a straightforward integral formulation.
В.Б. Роганков
2015-02-01
Full Text Available A variety of test methods to estimate the water vapour transmission (WVT-rate of thin membranes do not provide, unfortunately, the reliable basis to compare the permeability of different fabrics. Their results are crucially dependent on the details and construction of experimental methodologies as well as on the accepted by the different authors conditions of measurement. In this work, we propose the universal approach and demonstrate its adequate realization to compare the transport properties of any semi-permeable membranes measured by the conventional test-methods. The purpose is to avoid any confusion in such procedure of comparison. We have analysed below the WVT-rates measured by six alternative test-methods, which have been applied step-by-step to six different fabrics. In opposite to the widespread search for a pair correlation between the above results obtained by any two methods we treat them, in total, for each fabric in terms of the reduced variables. This approach is based on the novel concept of the moisture percolation (MP-rate which combines the diffusion and convective contributions in a transport process. It leads to the well-established general estimates of the normalized WVT-rates measured by the standard test-methods. Another advantage of the developed approach is its thermodynamic consistency, which offers the appropriate fluctuation model to take into account the porosity of any semi-permeable membranes.
State-space Generalized Predicitve Control for redundant parallel robots
Belda, Květoslav; Böhm, Josef; Valášek, M.
2003-01-01
Roč. 31, č. 3 (2003), s. 413-432 ISSN 1539-7734 R&D Projects: GA ČR GA101/03/0620 Grant - others:CTU(CZ) 0204512 Institutional research plan: CEZ:AV0Z1075907 Keywords : parallel robot construction * generalized predictive control * drive redundancy Subject RIV: BC - Control Systems Theory http://library.utia.cas.cz/separaty/historie/belda-0411126.pdf
Domain Specific Language for Geant4 Parallelization for Space-based Applications, Phase I
National Aeronautics and Space Administration — A major limiting factor in HPC growth is the requirement to parallelize codes to leverage emerging architectures, especially as single core performance has plateaued...
Zheng, R.-F.; Wu, T.-H.; Li, X.-Y.; Chen, W.-Q.
2018-06-01
The problem of a penny-shaped crack embedded in an infinite space of transversely isotropic multi-ferroic composite medium is investigated. The crack is assumed to be subjected to uniformly distributed mechanical, electric and magnetic loads applied symmetrically on the upper and lower crack surfaces. The semi-permeable (limited-permeable) electro-magnetic boundary condition is adopted. By virtue of the generalized method of potential theory and the general solutions, the boundary integro-differential equations governing the mode I crack problem, which are of nonlinear nature, are established and solved analytically. Exact and complete coupling magneto-electro-elastic field is obtained in terms of elementary functions. Important parameters in fracture mechanics on the crack plane, e.g., the generalized crack surface displacements, the distributions of generalized stresses at the crack tip, the generalized stress intensity factors and the energy release rate, are explicitly presented. To validate the present solutions, a numerical code by virtue of finite element method is established for 3D crack problems in the framework of magneto-electro-elasticity. To evaluate conveniently the effect of the medium inside the crack, several empirical formulae are developed, based on the numerical results.
Minimal surfaces in symmetric spaces with parallel second ...
Xiaoxiang Jiao
2017-07-31
Jul 31, 2017 ... space and its non-compact dual by totally real, totally complex, and invariant immersions. ... frame fields, let θ1,θ2 and ω1,...,ωn be their dual frames. ... where ˜∇ is the induced connection of the pull-back bundle f. −1. T(N), which is defined by. ˜∇X W = ¯∇ f∗ X W for W ∈ f. −1. T(N) and X ∈ T(M). Let f∗(ei ) ...
Gianluca, Longoni; Alireza, Haghighat
2003-01-01
In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)
Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim
2014-07-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.
State-space-based harmonic stability analysis for paralleled grid-connected inverters
Wang, Yanbo; Wang, Xiongfei; Chen, Zhe
2016-01-01
This paper addresses a state-space-based harmonic stability analysis of paralleled grid-connected inverters system. A small signal model of individual inverter is developed, where LCL filter, the equivalent delay of control system, and current controller are modeled. Then, the overall small signal...... model of paralleled grid-connected inverters is built. Finally, the state space-based stability analysis approach is developed to explain the harmonic resonance phenomenon. The eigenvalue traces associated with time delay and coupled grid impedance are obtained, which accounts for how the unstable...... inverter produces the harmonic resonance and leads to the instability of whole paralleled system. The proposed approach reveals the contributions of the grid impedance as well as the coupled effect on other grid-connected inverters under different grid conditions. Simulation and experimental results...
Algorithms for a parallel implementation of Hidden Markov Models with a small state space
Nielsen, Jesper; Sand, Andreas
2011-01-01
Two of the most important algorithms for Hidden Markov Models are the forward and the Viterbi algorithms. We show how formulating these using linear algebra naturally lends itself to parallelization. Although the obtained algorithms are slow for Hidden Markov Models with large state spaces...
Large parallel volumes of finite and compact sets in d-dimensional Euclidean space
Kampf, Jürgen; Kiderlen, Markus
The r-parallel volume V (Cr) of a compact subset C in d-dimensional Euclidean space is the volume of the set Cr of all points of Euclidean distance at most r > 0 from C. According to Steiner’s formula, V (Cr) is a polynomial in r when C is convex. For finite sets C satisfying a certain geometric...
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations
Fijany, Amir
1993-01-01
In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
Nearly auto-parallel maps and conservation laws on curved spaces
Vacaru, S.
1994-01-01
The theory of nearly auto-parallel maps (na-maps, generalization of conformal transforms) of Einstein-Cartan spaces is formulated. The transformation laws of geometrical objects and gravitational and matter field equations under superpositions of na-maps are considered. A special attention is paid to the very important problem of definition of conservation laws for gravitational fields. (Author)
André Lourenço, Rafael; Francisco de Oliveira, Fábio; Haddad Nudi, Adriana; Rebello Wagener, Ângela de Luca; Guadalupe Meniconi, Maria de Fátima; Francioni, Eleine
2015-06-01
The Campos Basin is Brazil's main oil and gas production area. In 2013, more than 50 million cubic meters of produced water (PW) was discharged into these offshore waters. Despite the large volumes of PW that are discharged in the Campos Basin each day, the ecological concern of the chemicals in the PW are not completely understood. Polycyclic aromatic hydrocarbons (PAH) are the most important contributors to the ecological hazards that are posed by discharged PW. This study aimed to evaluate the potential bioaccumulation of PAH using transplanted bivalves (Nodipecten nodosus) and semi-permeable membrane devices (SPMD). The study was conducted in two platforms that discharge PW (P19 and P40). Another platform that does not discharge PW (P25) was investigated for comparison with the obtained results. Time-integrated hydrocarbon concentrations using SPMD and transplanted bivalves were estimated from the seawater near the three platforms. The bioaccumulation of the PAH in the transplanted bivalves at platforms P19 and P40 were up to fivefold greater than the bioaccumulation of the PAH at platform P25. The lowest PAH concentrations were estimated for platform P25 (4.3-6.2 ng L-1), and the highest PAH concentrations were estimated for platform P19 (9.2-37.3 ng L-1). Both techniques were effective for determining the bioavailability of the PAH and for providing time-integrated hydrocarbon concentrations regarding oil and gas production activities.
Zhou, Yifan; Lin, Tian Ran; Sun, Yong; Bian, Yangqing; Ma, Lin
2015-01-01
Maintenance optimisation of series–parallel systems is a research topic of practical significance. Nevertheless, a cost-effective maintenance strategy is difficult to obtain due to the large strategy space for maintenance optimisation of such systems. The heuristic algorithm is often employed to deal with this problem. However, the solution obtained by the heuristic algorithm is not always the global optimum and the algorithm itself can be very time consuming. An alternative method based on linear programming is thus developed in this paper to overcome such difficulties by reducing strategy space of maintenance optimisation. A theoretical proof is provided in the paper to verify that the proposed method is at least as effective as the existing methods for strategy space reduction. Numerical examples for maintenance optimisation of series–parallel systems having multistate components and considering both economic dependence among components and multiple-level imperfect maintenance are also presented. The simulation results confirm that the proposed method is more effective than the existing methods in removing inappropriate maintenance strategies of multistate series–parallel systems. - Highlights: • A new method using linear programming is developed to reduce the strategy space. • The effectiveness of the new method for strategy reduction is theoretically proved. • Imperfect maintenance and economic dependence are considered during optimisation
Scattering by multiple parallel radially stratified infinite cylinders buried in a lossy half space.
Lee, Siu-Chun
2013-07-01
The theoretical solution for scattering by an arbitrary configuration of closely spaced parallel infinite cylinders buried in a lossy half space is presented in this paper. The refractive index and permeability of the half space and cylinders are complex in general. Each cylinder is radially stratified with a distinct complex refractive index and permeability. The incident radiation is an arbitrarily polarized plane wave propagating in the plane normal to the axes of the cylinders. Analytic solutions are derived for the electric and magnetic fields and the Poynting vector of backscattered radiation emerging from the half space. Numerical examples are presented to illustrate the application of the scattering solution to calculate backscattering from a lossy half space containing multiple homogeneous and radially stratified cylinders at various depths and different angles of incidence.
A massively-parallel electronic-structure calculations based on real-space density functional theory
Iwata, Jun-Ichi; Takahashi, Daisuke; Oshiyama, Atsushi; Boku, Taisuke; Shiraishi, Kenji; Okada, Susumu; Yabana, Kazuhiro
2010-01-01
Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N 3 ) operations such as the Gram-Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.
Parallel magnetic resonance imaging as approximation in a reproducing kernel Hilbert space
Athalye, Vivek; Lustig, Michael; Martin Uecker
2015-01-01
In magnetic resonance imaging data samples are collected in the spatial frequency domain (k-space), typically by time-consuming line-by-line scanning on a Cartesian grid. Scans can be accelerated by simultaneous acquisition of data using multiple receivers (parallel imaging), and by using more efficient non-Cartesian sampling schemes. To understand and design k-space sampling patterns, a theoretical framework is needed to analyze how well arbitrary sampling patterns reconstruct unsampled k-space using receive coil information. As shown here, reconstruction from samples at arbitrary locations can be understood as approximation of vector-valued functions from the acquired samples and formulated using a reproducing kernel Hilbert space with a matrix-valued kernel defined by the spatial sensitivities of the receive coils. This establishes a formal connection between approximation theory and parallel imaging. Theoretical tools from approximation theory can then be used to understand reconstruction in k-space and to extend the analysis of the effects of samples selection beyond the traditional image-domain g-factor noise analysis to both noise amplification and approximation errors in k-space. This is demonstrated with numerical examples. (paper)
Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David
1987-01-01
The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
Parallel symbolic state-space exploration is difficult, but what is the alternative?
Gianfranco Ciardo
2009-12-01
Full Text Available State-space exploration is an essential step in many modeling and analysis problems. Its goal is to find the states reachable from the initial state of a discrete-state model described. The state space can used to answer important questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a starting point for sophisticated investigations expressed in temporal logic. Unfortunately, the state space is often so large that ordinary explicit data structures and sequential algorithms cannot cope, prompting the exploration of (1 parallel approaches using multiple processors, from simple workstation networks to shared-memory supercomputers, to satisfy large memory and runtime requirements and (2 symbolic approaches using decision diagrams to encode the large structured sets and relations manipulated during state-space generation. Both approaches have merits and limitations. Parallel explicit state-space generation is challenging, but almost linear speedup can be achieved; however, the analysis is ultimately limited by the memory and processors available. Symbolic methods are a heuristic that can efficiently encode many, but not all, functions over a structured and exponentially large domain; here the pitfalls are subtler: their performance varies widely depending on the class of decision diagram chosen, the state variable order, and obscure algorithmic parameters. As symbolic approaches are often much more efficient than explicit ones for many practical models, we argue for the need to parallelize symbolic state-space generation algorithms, so that we can realize the advantage of both approaches. This is a challenging endeavor, as the most efficient symbolic algorithm, Saturation, is inherently sequential. We conclude by discussing challenges, efforts, and promising directions toward this goal.
Exploiting Stabilizers and Parallelism in State Space Generation with the Symmetry Method
Lorentsen, Louise; Kristensen, Lars Michael
2001-01-01
The symmetry method is a main reduction paradigm for alleviating the state explosion problem. For large symmetry groups deciding whether two states are symmetric becomes time expensive due to the apparent high time complexity of the orbit problem. The contribution of this paper is to alleviate th...... the negative impact of the orbit problem by the specification of canonical representatives for equivalence classes of states in Coloured Petri Nets, and by giving algorithms exploiting stabilizers and parallelism for computing the condensed state space....
Phase space simulation of collisionless stellar systems on the massively parallel processor
White, R.L.
1987-01-01
A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem
Enhanced 2D-DOA Estimation for Large Spacing Three-Parallel Uniform Linear Arrays
Dong Zhang
2018-01-01
Full Text Available An enhanced two-dimensional direction of arrival (2D-DOA estimation algorithm for large spacing three-parallel uniform linear arrays (ULAs is proposed in this paper. Firstly, we use the propagator method (PM to get the highly accurate but ambiguous estimation of directional cosine. Then, we use the relationship between the directional cosine to eliminate the ambiguity. This algorithm not only can make use of the elements of the three-parallel ULAs but also can utilize the connection between directional cosine to improve the estimation accuracy. Besides, it has satisfied estimation performance when the elevation angle is between 70° and 90° and it can automatically pair the estimated azimuth and elevation angles. Furthermore, it has low complexity without using any eigen value decomposition (EVD or singular value decompostion (SVD to the covariance matrix. Simulation results demonstrate the effectiveness of our proposed algorithm.
Fast MR image reconstruction for partially parallel imaging with arbitrary k-space trajectories.
Ye, Xiaojing; Chen, Yunmei; Lin, Wei; Huang, Feng
2011-03-01
Both acquisition and reconstruction speed are crucial for magnetic resonance (MR) imaging in clinical applications. In this paper, we present a fast reconstruction algorithm for SENSE in partially parallel MR imaging with arbitrary k-space trajectories. The proposed method is a combination of variable splitting, the classical penalty technique and the optimal gradient method. Variable splitting and the penalty technique reformulate the SENSE model with sparsity regularization as an unconstrained minimization problem, which can be solved by alternating two simple minimizations: One is the total variation and wavelet based denoising that can be quickly solved by several recent numerical methods, whereas the other one involves a linear inversion which is solved by the optimal first order gradient method in our algorithm to significantly improve the performance. Comparisons with several recent parallel imaging algorithms indicate that the proposed method significantly improves the computation efficiency and achieves state-of-the-art reconstruction quality.
A Self Consistent Multiprocessor Space Charge Algorithm that is Almost Embarrassingly Parallel
Nissen, Edward; Erdelyi, B.; Manikonda, S.L.
2012-01-01
We present a space charge code that is self consistent, massively parallelizeable, and requires very little communication between computer nodes; making the calculation almost embarrassingly parallel. This method is implemented in the code COSY Infinity where the differential algebras used in this code are important to the algorithm's proper functioning. The method works by calculating the self consistent space charge distribution using the statistical moments of the test particles, and converting them into polynomial series coefficients. These coefficients are combined with differential algebraic integrals to form the potential, and electric fields. The result is a map which contains the effects of space charge. This method allows for massive parallelization since its statistics based solver doesn't require any binning of particles, and only requires a vector containing the partial sums of the statistical moments for the different nodes to be passed. All other calculations are done independently. The resulting maps can be used to analyze the system using normal form analysis, as well as advance particles in numbers and at speeds that were previously impossible.
STEP: Self-supporting tailored k-space estimation for parallel imaging reconstruction.
Zhou, Zechen; Wang, Jinnan; Balu, Niranjan; Li, Rui; Yuan, Chun
2016-02-01
A new subspace-based iterative reconstruction method, termed Self-supporting Tailored k-space Estimation for Parallel imaging reconstruction (STEP), is presented and evaluated in comparison to the existing autocalibrating method SPIRiT and calibrationless method SAKE. In STEP, two tailored schemes including k-space partition and basis selection are proposed to promote spatially variant signal subspace and incorporated into a self-supporting structured low rank model to enforce properties of locality, sparsity, and rank deficiency, which can be formulated into a constrained optimization problem and solved by an iterative algorithm. Simulated and in vivo datasets were used to investigate the performance of STEP in terms of overall image quality and detail structure preservation. The advantage of STEP on image quality is demonstrated by retrospectively undersampled multichannel Cartesian data with various patterns. Compared with SPIRiT and SAKE, STEP can provide more accurate reconstruction images with less residual aliasing artifacts and reduced noise amplification in simulation and in vivo experiments. In addition, STEP has the capability of combining compressed sensing with arbitrary sampling trajectory. Using k-space partition and basis selection can further improve the performance of parallel imaging reconstruction with or without calibration signals. © 2015 Wiley Periodicals, Inc.
Evaluation of the Intel iWarp parallel processor for space flight applications
Hine, Butler P., III; Fong, Terrence W.
1993-01-01
The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.
PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules
Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.
2018-03-01
The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.
Analytical model for vibration prediction of two parallel tunnels in a full-space
He, Chao; Zhou, Shunhua; Guo, Peijun; Di, Honggui; Zhang, Xiaohui
2018-06-01
This paper presents a three-dimensional analytical model for the prediction of ground vibrations from two parallel tunnels embedded in a full-space. The two tunnels are modelled as cylindrical shells of infinite length, and the surrounding soil is modelled as a full-space with two cylindrical cavities. A virtual interface is introduced to divide the soil into the right layer and the left layer. By transforming the cylindrical waves into the plane waves, the solution of wave propagation in the full-space with two cylindrical cavities is obtained. The transformations from the plane waves to cylindrical waves are then used to satisfy the boundary conditions on the tunnel-soil interfaces. The proposed model provides a highly efficient tool to predict the ground vibration induced by the underground railway, which accounts for the dynamic interaction between neighbouring tunnels. Analysis of the vibration fields produced over a range of frequencies and soil properties is conducted. When the distance between the two tunnels is smaller than three times the tunnel diameter, the interaction between neighbouring tunnels is highly significant, at times in the order of 20 dB. It is necessary to consider the interaction between neighbouring tunnels for the prediction of ground vibrations induced underground railways.
A parallel implementation of particle tracking with space charge effects on an INTEL iPSC/860
Chang, L.; Bourianoff, G.; Cole, B.; Machida, S.
1993-05-01
Particle-tracking simulation is one of the scientific applications that is well-suited to parallel computations. At the Superconducting Super Collider, it has been theoretically and empirically demonstrated that particle tracking on a designed lattice can achieve very high parallel efficiency on a MIMD Intel iPSC/860 machine. The key to such success is the realization that the particles can be tracked independently without considering their interaction. The perfectly parallel nature of particle tracking is broken if the interaction effects between particles are included. The space charge introduces an electromagnetic force that will affect the motion of tracked particles in 3-D space. For accurate modeling of the beam dynamics with space charge effects, one needs to solve three-dimensional Maxwell field equations, usually by a particle-in-cell (PIC) algorithm. This will require each particle to communicate with its neighbor grids to compute the momentum changes at each time step. It is expected that the 3-D PIC method will degrade parallel efficiency of particle-tracking implementation on any parallel computer. In this paper, we describe an efficient scheme for implementing particle tracking with space charge effects on an INTEL iPSC/860 machine. Experimental results show that a parallel efficiency of 75% can be obtained
Use of Parallel Micro-Platform for the Simulation the Space Exploration
Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen
The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
SiGN-SSM: open source parallel software for estimating gene networks with state space models.
Tamada, Yoshinori; Yamaguchi, Rui; Imoto, Seiya; Hirose, Osamu; Yoshida, Ryo; Nagasaki, Masao; Miyano, Satoru
2011-04-15
SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles. SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site. tamada@ims.u-tokyo.ac.jp.
A Parallel Strategy for High-speed Interpolation of CNC Using Data Space Constraint Method
Shuan-qiang Yang
2013-12-01
Full Text Available A high-speed interpolation scheme using parallel computing is proposed in this paper. The interpolation method is divided into two tasks, namely, the rough task executing in PC and the fine task in the I/O card. During the interpolation procedure, the double buffers are constructed to exchange the interpolation data between the two tasks. Then, the data space constraint method is adapted to ensure the reliable and continuous data communication between the two buffers. Therefore, the proposed scheme can be realized in the common distribution of the operation systems without real-time performance. The high-speed and high-precision motion control can be achieved as well. Finally, an experiment is conducted on the self-developed CNC platform, the test results are shown to verify the proposed method.
Computations on the massively parallel processor at the Goddard Space Flight Center
Strong, James P.
1991-01-01
Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
Exploiting Stabilizers and Parallelism in State Space Generation with the Symmetry Method
Lorentsen, Louise; Kristensen, Lars Michael
2001-01-01
The symmetry method is a main reduction paradigm for alleviating the state explosion problem. For large symmetry groups deciding whether two states are symmetric becomes time expensive due to the apparent high time complexity of the orbit problem. The contribution of this paper is to alleviate th...... the negative impact of the orbit problem by the specification of canonical representatives for equivalence classes of states in Coloured Petri Nets, and by giving algorithms exploiting stabilizers and parallelism for computing the condensed state space.......The symmetry method is a main reduction paradigm for alleviating the state explosion problem. For large symmetry groups deciding whether two states are symmetric becomes time expensive due to the apparent high time complexity of the orbit problem. The contribution of this paper is to alleviate...
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness
Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.
2018-03-01
This paper introduces a parallel-compiled tool that combines several of our recently developed methods for solving the perturbed Lambert problem using modified Chebyshev-Picard iteration. This tool (unified Lambert tool) consists of four individual algorithms, each of which is unique and better suited for solving a particular type of orbit transfer. The first is a Keplerian Lambert solver, which is used to provide a good initial guess (warm start) for solving the perturbed problem. It is also used to determine the appropriate algorithm to call for solving the perturbed problem. The arc length or true anomaly angle spanned by the transfer trajectory is the parameter that governs the automated selection of the appropriate perturbed algorithm, and is based on the respective algorithm convergence characteristics. The second algorithm solves the perturbed Lambert problem using the modified Chebyshev-Picard iteration two-point boundary value solver. This algorithm does not require a Newton-like shooting method and is the most efficient of the perturbed solvers presented herein, however the domain of convergence is limited to about a third of an orbit and is dependent on eccentricity. The third algorithm extends the domain of convergence of the modified Chebyshev-Picard iteration two-point boundary value solver to about 90% of an orbit, through regularization with the Kustaanheimo-Stiefel transformation. This is the second most efficient of the perturbed set of algorithms. The fourth algorithm uses the method of particular solutions and the modified Chebyshev-Picard iteration initial value solver for solving multiple revolution perturbed transfers. This method does require "shooting" but differs from Newton-like shooting methods in that it does not require propagation of a state transition matrix. The unified Lambert tool makes use of the General Mission Analysis Tool and we use it to compute thousands of perturbed Lambert trajectories in parallel on the Space Situational
Gianluca, Longoni; Alireza, Haghighat [Florida University, Nuclear and Radiological Engineering Department, Gainesville, FL (United States)
2003-07-01
In recent years, the SP{sub L} (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP{sub L} equations starting from the even-parity form of the S{sub N} equations. The SP{sub L} equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP{sub L} equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp{sub L} (Parallel Environment Neutral-particle SP{sub L}). Pensp{sub L} solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP{sub L} matrices. Pensp{sub L} includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)
Liu, H.
1996-01-01
Computer simulations using the multi-particle code PARMELA with a three-dimensional point-by-point space charge algorithm have turned out to be very helpful in supporting injector commissioning and operations at Thomas Jefferson National Accelerator Facility (Jefferson Lab, formerly called CEBAF). However, this algorithm, which defines a typical N 2 problem in CPU time scaling, is very time-consuming when N, the number of macro-particles, is large. Therefore, it is attractive to use massively parallel processors (MPPs) to speed up the simulations. Motivated by this, the authors modified the space charge subroutine for using the MPPs of the Cray T3D. The techniques used to parallelize and optimize the code on the T3D are discussed in this paper. The performance of the code on the T3D is examined in comparison with a Parallel Vector Processing supercomputer of the Cray C90 and an HP 735/15 high-end workstation
Flow and heat transfer in parallel channel attached with equally-spaced ribs, 2
Kunugi, Tomoaki; Takizuka, Takakazu
1980-09-01
Using a computer code for the analysis of the flow and heat transfer in a parallel channel attached with equally-spaced ribs, calculations are performed when a pitch to rib-width ratio is 7 : 1, a rib-width to rib-height ratio is 2 : 1 and a channel-height to rib-height is 3 : 1. Assuming that the fluid properties and the heat-flux at the wall of this channel are constant, characteristics of the flow and heat transfer are analyzed in the range of Reynolds number from 10 to 250. The following results are obtained: (1) The separation region behind a rib grows downstream with the increase of Reynolds number. (2) The pressure drop of ribbed channel is greater than that of the smooth channel, and increases as Reynolds number increases. (3) The mean Nusselt number of ribbed channel is about 10 - 11 at the upper wall and about 7.5 at the lower wall in the range of Reynolds number from 10 to 250. (author)
Rainer, Löwen
2017-01-01
We prove that the automorphism group of a topological parallelism on real projective 3-space is compact. In a preceding article it was proved that at least the connected component of the identity is compact. The present proof does not depend on that earlier result.
Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo
2008-01-01
We present a parallel implementation of a string-driven general active space configuration interaction program for nonrelativistic and scalar-relativistic electronic-structure calculations. The code has been modularly incorporated in the DIRAC quantum chemistry program package. The implementation...
Parallel field line and stream line tracing algorithms for space physics applications
Toth, G.; de Zeeuw, D.; Monostori, G.
2004-05-01
Field line and stream line tracing is required in various space physics applications, such as the coupling of the global magnetosphere and inner magnetosphere models, the coupling of the solar energetic particle and heliosphere models, or the modeling of comets, where the multispecies chemical equations are solved along stream lines of a steady state solution obtained with single fluid MHD model. Tracing a vector field is an inherently serial process, which is difficult to parallelize. This is especially true when the data corresponding to the vector field is distributed over a large number of processors. We designed algorithms for the various applications, which scale well to a large number of processors. In the first algorithm the computational domain is divided into blocks. Each block is on a single processor. The algorithm folows the vector field inside the blocks, and calculates a mapping of the block surfaces. The blocks communicate the values at the coinciding surfaces, and the results are interpolated. Finally all block surfaces are defined and values inside the blocks are obtained. In the second algorithm all processors start integrating along the vector field inside the accessible volume. When the field line leaves the local subdomain, the position and other information is stored in a buffer. Periodically the processors exchange the buffers, and continue integration of the field lines until they reach a boundary. At that point the results are sent back to the originating processor. Efficiency is achieved by a careful phasing of computation and communication. In the third algorithm the results of a steady state simulation are stored on a hard drive. The vector field is contained in blocks. All processors read in all the grid and vector field data and the stream lines are integrated in parallel. If a stream line enters a block, which has already been integrated, the results can be interpolated. By a clever ordering of the blocks the execution speed can be
Candel, A.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.
2007-01-01
Over the past years, SLAC's Advanced Computations Department (ACD) has developed the parallel finite element (FE) particle-in-cell code Pic3P (Pic2P) for simulations of beam-cavity interactions dominated by space-charge effects. As opposed to standard space-charge dominated beam transport codes, which are based on the electrostatic approximation, Pic3P (Pic2P) includes space-charge, retardation and boundary effects as it self-consistently solves the complete set of Maxwell-Lorentz equations using higher-order FE methods on conformal meshes. Use of efficient, large-scale parallel processing allows for the modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of the next-generation of accelerator facilities. Applications to the Linac Coherent Light Source (LCLS) RF gun are presented
Parallel translation in warped product spaces: application to the Reissner-Nordstroem spacetime
Raposo, A P; Del Riego, L
2005-01-01
A formal treatment of the parallel translation transformations in warped product manifolds is presented and related to those parallel translation transformations in each of the factor manifolds. A straightforward application to the Schwarzschild and Reissner-Nordstroem geometries, considered here as particular examples, explains some apparently surprising properties of the holonomy in these manifolds
Historical parallels of biological space experiments from Soyuz, Salyut and Mir to Shenzhou flights
Nechitailo, Galina S.; Kondyurin, Alexey
2016-07-01
Human exploitation of space is a great achievement of our civilization. After the first space flights a development of artificial biological environment in space systems is a second big step. First successful biological experiments on a board of space station were performed on Salyut and Mir stations in 70-90th of last century such as - first long time cultivation of plants in space (wheat, linen, lettuce, crepis); - first flowers in space (Arabidopsis); - first harvesting of seeds in space (Arabidopsis); - first harvesting of roots (radish); - first full life cycle from seeds to seeds in space (wheat), Guinness recorded; - first tissue culture experiments (Panax ginseng L, Crocus sativus L, Stevia rebaundiana B; - first tree growing in space for 2 years (Limonia acidissima), Guinness recorded. As a new wave, the modern experiments on a board of Shenzhou Chinese space ships are performed with plants and tissue culture. The space flight experiments are now focused on applications of the space biology results to Earth technologies. In particular, the tomato seeds exposed 6 years in space are used in pharmacy industry in more then 10 pharmaceutical products. Tissue culture experiments are performed on the board of Shenzhou spaceship for creation of new bioproducts including Space Panax ginseng, Space Spirulina, Space Stetatin, Space Tomato and others products with unique properties. Space investments come back.
Yang, Chifu; Zhao, Jinsong; Li, Liyi; Agrawal, Sunil K
2018-01-01
Robotic spine brace based on parallel-actuated robotic system is a new device for treatment and sensing of scoliosis, however, the strong dynamic coupling and anisotropy problem of parallel manipulators result in accuracy loss of rehabilitation force control, including big error in direction and value of force. A novel active force control strategy named modal space force control is proposed to solve these problems. Considering the electrical driven system and contact environment, the mathematical model of spatial parallel manipulator is built. The strong dynamic coupling problem in force field is described via experiments as well as the anisotropy problem of work space of parallel manipulators. The effects of dynamic coupling on control design and performances are discussed, and the influences of anisotropy on accuracy are also addressed. With mass/inertia matrix and stiffness matrix of parallel manipulators, a modal matrix can be calculated by using eigenvalue decomposition. Making use of the orthogonality of modal matrix with mass matrix of parallel manipulators, the strong coupled dynamic equations expressed in work space or joint space of parallel manipulator may be transformed into decoupled equations formulated in modal space. According to this property, each force control channel is independent of others in the modal space, thus we proposed modal space force control concept which means the force controller is designed in modal space. A modal space active force control is designed and implemented with only a simple PID controller employed as exampled control method to show the differences, uniqueness, and benefits of modal space force control. Simulation and experimental results show that the proposed modal space force control concept can effectively overcome the effects of the strong dynamic coupling and anisotropy problem in the physical space, and modal space force control is thus a very useful control framework, which is better than the current joint
Anticonvection device for a narrow space comprised between two parallel walls
Costes, Didier.
1975-01-01
The invention relates to an anticonvection device providing strong limitations against the convection currents inside a space submitted to a vertical thermal gradient and more especially the space enclosed between the inner wall of a vessel generally cyclindrical in shape and of vertical axis, intended for a nuclear reactor, and the outer wall of a plug fitted together with said vessel. To this effect, said device is characterized in that it comprises a packing of a material of open porosity and thickness-wise elasticity, in the form of threads, fibers, knitted-cloths or sheets separated by distances shorter than the thickness of stagnancy under the temperature conditions inside said space [fr
Alves Júnior, A. A.; Sokoloff, M. D.
2017-10-01
MCBooster is a header-only, C++11-compliant library that provides routines to generate and perform calculations on large samples of phase space Monte Carlo events. To achieve superior performance, MCBooster is capable to perform most of its calculations in parallel using CUDA- and OpenMP-enabled devices. MCBooster is built on top of the Thrust library and runs on Linux systems. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of performance in a variety of environments
Development of parallel algorithms for electrical power management in space applications
Berry, Frederick C.
1989-01-01
The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
Xueli Chen
2010-01-01
Full Text Available During the past decade, Monte Carlo method has obtained wide applications in optical imaging to simulate photon transport process inside tissues. However, this method has not been effectively extended to the simulation of free-space photon transport at present. In this paper, a uniform framework for noncontact optical imaging is proposed based on Monte Carlo method, which consists of the simulation of photon transport both in tissues and in free space. Specifically, the simplification theory of lens system is utilized to model the camera lens equipped in the optical imaging system, and Monte Carlo method is employed to describe the energy transformation from the tissue surface to the CCD camera. Also, the focusing effect of camera lens is considered to establish the relationship of corresponding points between tissue surface and CCD camera. Furthermore, a parallel version of the framework is realized, making the simulation much more convenient and effective. The feasibility of the uniform framework and the effectiveness of the parallel version are demonstrated with a cylindrical phantom based on real experimental results.
An image-space parallel convolution filtering algorithm based on shadow map
Li, Hua; Yang, Huamin; Zhao, Jianping
2017-07-01
Shadow mapping is commonly used in real-time rendering. In this paper, we presented an accurate and efficient method of soft shadows generation from planar area lights. First this method generated a depth map from light's view, and analyzed the depth-discontinuities areas as well as shadow boundaries. Then these areas were described as binary values in the texture map called binary light-visibility map, and a parallel convolution filtering algorithm based on GPU was enforced to smooth out the boundaries with a box filter. Experiments show that our algorithm is an effective shadow map based method that produces perceptually accurate soft shadows in real time with more details of shadow boundaries compared with the previous works.
Yan, Haojing; Yan, Lin; Zamojski, Michel A.; Windhorst, Rogier A.; McCarthy, Patrick J.; Fan, Xiaohui; Röttgering, Huub J. A.; Koekemoer, Anton M.; Robertson, Brant E.; Davé, Romeel; Cai, Zheng
2011-02-01
We report the first results from the Hubble Infrared Pure Parallel Imaging Extragalactic Survey, which utilizes the pure parallel orbits of the Hubble Space Telescope to do deep imaging along a large number of random sightlines. To date, our analysis includes 26 widely separated fields observed by the Wide Field Camera 3, which amounts to 122.8 arcmin2 in total area. We have found three bright Y 098-dropouts, which are candidate galaxies at z >~ 7.4. One of these objects shows an indication of peculiar variability and its nature is uncertain. The other two objects are among the brightest candidate galaxies at these redshifts known to date (L>2L*). Such very luminous objects could be the progenitors of the high-mass Lyman break galaxies observed at lower redshifts (up to z ~ 5). While our sample is still limited in size, it is much less subject to the uncertainty caused by "cosmic variance" than other samples because it is derived using fields along many random sightlines. We find that the existence of the brightest candidate at z ≈ 7.4 is not well explained by the current luminosity function (LF) estimates at z ≈ 8. However, its inferred surface density could be explained by the prediction from the LFs at z ≈ 7 if it belongs to the high-redshift tail of the galaxy population at z ≈ 7. Based on observations made with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555. These observations are associated with programs 11700 and 11702.
Hogendoorn, E A; Westhuis, K; Dijkman, E; Heusinkveld, H A; den Boer, A C; Evers, E A; Baumann, R A
1999-10-08
The coupled-column (LC-LC) configuration consisting of a 3 microm C18 column (50 x 4.6 mm I.D.) as the first column and a 5 microm C18 semi-permeable-surface (SPS) column (150 x 4.6 mm I.D.) as the second column appeared to be successful for the screening of acidic pesticides in surface water samples. In comparison to LC-LC employing two C18 columns, the combination of C18/SPS-C18 significantly decreased the baseline deviation caused by the hump of the co-extracted humic substances when using UV detection (217 nm). The developed LC-LC procedure allowed the simultaneous determination of the target analytes bentazone and bromoxynil in uncleaned extracts of surface water samples to a level of 0.05 microg/l in less than 15 min. In combination with a simple solid-phase extraction step (200 ml of water on a 500 mg C18-bonded silica) the analytical procedure provides a high sample throughput. During a period of about five months more than 200 ditch-water samples originating from agricultural locations were analyzed with the developed procedure. Validation of the method was performed by randomly analyzing recoveries of water samples spiked at levels of 0.1 microg/l (n=10), 0.5 microg/l (n=7) and 2.5 microg/l (n=4). Weighted regression of the recovery data showed that the method provides overall recoveries of 95 and 100% for bentazone and bromoxynil, respectively, with corresponding intra-laboratory reproducibilities of 10 and 11%, respectively. Confirmation of the analytes in part of the samples extracts was carried out with GC-negative ion chemical ionization MS involving a derivatization step with bis(trifluoromethyl)benzyl bromide. No false negatives or positives were observed.
Yan Haojing; Yan Lin; Zamojski, Michel A.; Windhorst, Rogier A.; McCarthy, Patrick J.; Fan Xiaohui; Dave, Romeel; Roettgering, Huub J. A.; Koekemoer, Anton M.; Robertson, Brant E.; Cai Zheng
2011-01-01
We report the first results from the Hubble Infrared Pure Parallel Imaging Extragalactic Survey, which utilizes the pure parallel orbits of the Hubble Space Telescope to do deep imaging along a large number of random sightlines. To date, our analysis includes 26 widely separated fields observed by the Wide Field Camera 3, which amounts to 122.8 arcmin 2 in total area. We have found three bright Y 098 -dropouts, which are candidate galaxies at z ∼> 7.4. One of these objects shows an indication of peculiar variability and its nature is uncertain. The other two objects are among the brightest candidate galaxies at these redshifts known to date (L>2L*). Such very luminous objects could be the progenitors of the high-mass Lyman break galaxies observed at lower redshifts (up to z ∼ 5). While our sample is still limited in size, it is much less subject to the uncertainty caused by 'cosmic variance' than other samples because it is derived using fields along many random sightlines. We find that the existence of the brightest candidate at z ∼ 7.4 is not well explained by the current luminosity function (LF) estimates at z ∼ 8. However, its inferred surface density could be explained by the prediction from the LFs at z ∼ 7 if it belongs to the high-redshift tail of the galaxy population at z ∼ 7.
Kumar, Sameer
2010-06-15
Disclosed is a mechanism on receiving processors in a parallel computing system for providing order to data packets received from a broadcast call and to distinguish data packets received at nodes from several incoming asynchronous broadcast messages where header space is limited. In the present invention, processors at lower leafs of a tree do not need to obtain a broadcast message by directly accessing the data in a root processor's buffer. Instead, each subsequent intermediate node's rank id information is squeezed into the software header of packet headers. In turn, the entire broadcast message is not transferred from the root processor to each processor in a communicator but instead is replicated on several intermediate nodes which then replicated the message to nodes in lower leafs. Hence, the intermediate compute nodes become "virtual root compute nodes" for the purpose of replicating the broadcast message to lower levels of a tree.
Pillinger, C. T.; Pillinger, J. M.
2013-09-01
The European Space Agency (ESA)'s comet chaser mission, Rosetta, has been more than a quarter of a century in coming to fruition. Whilst it might sound a long time humankind has been interested in comets for much longer. For over a thousand years depictions of comets have been appearing in Art 1 including many humorous cartoons 2. There are numerous cometary metaphors throughout literature. With this in mind we have recognised that there is a tremendous opportunity with comets to introduce science to different non-scientific audiences who would not necessarily believe they were interested in science. A similar approach was adopted with great success for the Beagle 2 involvement in ESA's Mars Express 3,4. By exploiting the perhaps sometimes less obvious connections to the Rosetta mission we hope to capture the attention of non-scientists and introduce them to science unawares - a case of a little sugar to help the medicine go down. It is our belief that the Rosetta mission has enormous potential for bringing science to the unconverted. We give here one example of a connection between Art and the Rosetta mission. By choosing the allegorical name Rosetta for its cometary mission, ESA have immediately invited comparison with the stone tablet which provided the key to translating the languages of ancient cultures, particularly Egyptian hieroglyphics. It is well known that a scientist, Thomas Young, foreign secretary of The Royal Society, made the break through which recognised the name Ptolemy in a cartouche on the Rosetta stone which can be seen today at the British Museum. The events concerning the 'capture' of the Rosetta stone were witnessed by scientists Sir William Hamilton (a renowned geophysicist as well as husband of Horatio Nelson's notorious mistress Lady Hamilton) and Edward Daniel Clarke, a geologist who would become first Professor of Mineralogy at Cambridge and an early meteoricist. Young's inspiration allowed Jean-Francois Champollion to decipher the
Baxley, Brian T.; Murdoch, Jennifer L.; Swieringa, Kurt A.; Barmore, Bryan E.; Capron, William R.; Hubbs, Clay E.; Shay, Richard F.; Abbott, Terence S.
2013-01-01
The predicted increase in the number of commercial aircraft operations creates a need for improved operational efficiency. Two areas believed to offer increases in aircraft efficiency are optimized profile descents and dependent parallel runway operations. Using Flight deck Interval Management (FIM) software and procedures during these operations, flight crews can achieve by the runway threshold an interval assigned by air traffic control (ATC) behind the preceding aircraft that maximizes runway throughput while minimizing additional fuel consumption and pilot workload. This document describes an experiment where 24 pilots flew arrivals into the Dallas Fort-Worth terminal environment using one of three simulators at NASA?s Langley Research Center. Results indicate that pilots delivered their aircraft to the runway threshold within +/- 3.5 seconds of their assigned time interval, and reported low workload levels. In general, pilots found the FIM concept, procedures, speeds, and interface acceptable. Analysis of the time error and FIM speed changes as a function of arrival stream position suggest the spacing algorithm generates stable behavior while in the presence of continuous (wind) or impulse (offset) error. Concerns reported included multiple speed changes within a short time period, and an airspeed increase followed shortly by an airspeed decrease.
Gore, Brian Francis; Hooey, Becky Lee; Haan, Nancy; Socash, Connie; Mahlstedt, Eric; Foyle, David C.
2013-01-01
The Closely Spaced Parallel Operations (CSPO) scenario is a complex, human performance model scenario that tested alternate operator roles and responsibilities to a series of off-nominal operations on approach and landing (see Gore, Hooey, Mahlstedt, Foyle, 2013). The model links together the procedures, equipment, crewstation, and external environment to produce predictions of operator performance in response to Next Generation system designs, like those expected in the National Airspaces NextGen concepts. The task analysis that is contained in the present report comes from the task analysis window in the MIDAS software. These tasks link definitions and states for equipment components, environmental features as well as operational contexts. The current task analysis culminated in 3300 tasks that included over 1000 Subject Matter Expert (SME)-vetted, re-usable procedural sets for three critical phases of flight; the Descent, Approach, and Land procedural sets (see Gore et al., 2011 for a description of the development of the tasks included in the model; Gore, Hooey, Mahlstedt, Foyle, 2013 for a description of the model, and its results; Hooey, Gore, Mahlstedt, Foyle, 2013 for a description of the guidelines that were generated from the models results; Gore, Hooey, Foyle, 2012 for a description of the models implementation and its settings). The rollout, after landing checks, taxi to gate and arrive at gate illustrated in Figure 1 were not used in the approach and divert scenarios exercised. The other networks in Figure 1 set up appropriate context settings for the flight deck.The current report presents the models task decomposition from the tophighest level and decomposes it to finer-grained levels. The first task that is completed by the model is to set all of the initial settings for the scenario runs included in the model (network 75 in Figure 1). This initialization process also resets the CAD graphic files contained with MIDAS, as well as the embedded
Lin, Mingpei; Xu, Ming; Fu, Xiaoyu
2017-05-01
Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
Huysmans, M. C. D. N. J. M.; Klein, M. H. J.; Kok, G. F.; Whitworth, J. M.
2007-01-01
Aim To determine the deviation of parallel-sided twist-drills during post-channel preparation and relate this to tooth type and position. Methodology Human teeth with single root canals were selected: maxillary second premolars (group i); maxillary lateral incisors (group ii); mandibular canines
James G. Worner
2017-05-01
Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship. ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.
Robson, Philip M; Grant, Aaron K; Madhuranthakam, Ananth J; Lattanzi, Riccardo; Sodickson, Daniel K; McKenzie, Charles A
2008-10-01
Parallel imaging reconstructions result in spatially varying noise amplification characterized by the g-factor, precluding conventional measurements of noise from the final image. A simple Monte Carlo based method is proposed for all linear image reconstruction algorithms, which allows measurement of signal-to-noise ratio and g-factor and is demonstrated for SENSE and GRAPPA reconstructions for accelerated acquisitions that have not previously been amenable to such assessment. Only a simple "prescan" measurement of noise amplitude and correlation in the phased-array receiver, and a single accelerated image acquisition are required, allowing robust assessment of signal-to-noise ratio and g-factor. The "pseudo multiple replica" method has been rigorously validated in phantoms and in vivo, showing excellent agreement with true multiple replica and analytical methods. This method is universally applicable to the parallel imaging reconstruction techniques used in clinical applications and will allow pixel-by-pixel image noise measurements for all parallel imaging strategies, allowing quantitative comparison between arbitrary k-space trajectories, image reconstruction, or noise conditioning techniques. (c) 2008 Wiley-Liss, Inc.
Mehdian, H.; Hajisharifi, K.; Hasanbeigi, A.
2014-01-01
In this paper, quantum fluid equations together with Maxwell's equations are used to study the stability problem of non-parallel and non-relativistic plasma shells colliding over a “background plasma” at arbitrary angle, as a first step towards a microscopic understanding of the collision shocks. The calculations have been performed for all magnitude and directions of wave vectors. The colliding plasma shells in the vacuum region have been investigated in the previous works as a counter-streaming model. While, in the presence of background plasma (more realistic system), the colliding shells are mainly non-paralleled. The obtained results show that the presence of background plasma often suppresses the maximum growth rate of instabilities (in particular case, this behavior is contrary). It is also found that the largest maximum growth rate occurs for the two-stream instability of the configuration consisting of counter-streaming currents in a very dilute plasma background. The results derived in this study can be used to analyze the systems of three colliding plasma slabs, provided that the used coordinate system is stationary relative to the one of the particle slabs. The present analytical investigations can be applied to describe the quantum violent astrophysical phenomena such as white dwarf stars collision with other dense astrophysical bodies or supernova remnants. Moreover, at the limit of ℏ→0, the obtained results described the classical (sufficiently dilute) events of colliding plasma shells such as gamma-ray bursts and flares in the solar winds
Kwon, Jun Bum; Wang, Xiongfei; Bak, Claus Leth
2015-01-01
As the number of power electronics based systems are increasing, studies about overall stability and harmonic problems are rising. In order to analyze harmonics and stability, most research is using an analysis method, which is based on the Linear Time Invariant (LTI) approach. However, this can...... be difficult in terms of complex multi-parallel connected systems, especially in the case of renewable energy, where possibilities for intermittent operation due to the weather conditions exist. Hence, it can bring many different operating points to the power converter, and the impedance characteristics can...... can demonstrate other phenomenon, which can not be found in the conventional LTI approach. The theoretical modeling and analysis are verified by means of simulations and experiments....
Jonathan W Stone
Full Text Available We present new modifications to the Wuchty algorithm in order to better define and explore possible conformations for an RNA sequence. The new features, including parallelization, energy-independent lonely pair constraints, context-dependent chemical probing constraints, helix filters, and optional multibranch loops, provide useful tools for exploring the landscape of RNA folding. Chemical probing alone may not necessarily define a single unique structure. The helix filters and optional multibranch loops are global constraints on RNA structure that are an especially useful tool for generating models of encapsidated viral RNA for which cryoelectron microscopy or crystallography data may be available. The computations generate a combinatorially complete set of structures near a free energy minimum and thus provide data on the density and diversity of structures near the bottom of a folding funnel for an RNA sequence. The conformational landscapes for some RNA sequences may resemble a low, wide basin rather than a steep funnel that converges to a single structure.
Hennelly, B. M.; Javidi, B.; Sheridan, J. T.
2005-09-01
A number of methods have been recently proposed in the literature for the encryption of 2-D information using linear optical systems. In particular the double random phase encoding system has received widespread attention. This system uses two Random Phase Keys (RPK) positioned in the input spatial domain and the spatial frequency domain and if these random phases are described by statistically independent white noises then the encrypted image can be shown to be a white noise. Decryption only requires knowledge of the RPK in the frequency domain. The RPK may be implemented using a Spatial Light Modulators (SLM). In this paper we propose and investigate the use of SLMs for secure optical multiplexing. We show that in this case it is possible to encrypt multiple images in parallel and multiplex them for transmission or storage. The signal energy is effectively spread in the spatial frequency domain. As expected the number of images that can be multiplexed together and recovered without loss is proportional to the ratio of the input image and the SLM resolution. Many more images may be multiplexed with some loss in recovery. Furthermore each individual encryption is more robust than traditional double random phase encoding since decryption requires knowledge of both RPK and a lowpass filter in order to despread the spectrum and decrypt the image. Numerical simulations are presented and discussed.
Jerath, Ravinder; Cearley, Shannon M; Barnes, Vernon A; Jensen, Mike
2018-01-01
A fundamental function of the visual system is detecting motion, yet visual perception is poorly understood. Current research has determined that the retina and ganglion cells elicit responses for motion detection; however, the underlying mechanism for this is incompletely understood. Previously we proposed that retinogeniculo-cortical oscillations and photoreceptors work in parallel to process vision. Here we propose that motion could also be processed within the retina, and not in the brain as current theory suggests. In this paper, we discuss: 1) internal neural space formation; 2) primary, secondary, and tertiary roles of vision; 3) gamma as the secondary role; and 4) synchronization and coherence. Movement within the external field is instantly detected by primary processing within the space formed by the retina, providing a unified view of the world from an internal point of view. Our new theory begins to answer questions about: 1) perception of space, erect images, and motion, 2) purpose of lateral inhibition, 3) speed of visual perception, and 4) how peripheral color vision occurs without a large population of cones located peripherally in the retina. We explain that strong oscillatory activity influences on brain activity and is necessary for: 1) visual processing, and 2) formation of the internal visuospatial area necessary for visual consciousness, which could allow rods to receive precise visual and visuospatial information, while retinal waves could link the lateral geniculate body with the cortex to form a neural space formed by membrane potential-based oscillations and photoreceptors. We propose that vision is tripartite, with three components that allow a person to make sense of the world, terming them "primary, secondary, and tertiary roles" of vision. Finally, we propose that Gamma waves that are higher in strength and volume allow communication among the retina, thalamus, and various areas of the cortex, and synchronization brings cortical
C. Nagarajan
2012-09-01
Full Text Available This paper presents a Closed Loop CLL-T (capacitor inductor inductor Series Parallel Resonant Converter (SPRC has been simulated and the performance is analysised. A three element CLL-T SPRC working under load independent operation (voltage type and current type load is presented in this paper. The Steady state Stability Analysis of CLL-T SPRC has been developed using State Space technique and the regulation of output voltage is done by using Fuzzy controller. The simulation study indicates the superiority of fuzzy control over the conventional control methods. The proposed approach is expected to provide better voltage regulation for dynamic load conditions. A prototype 300 W, 100 kHz converter is designed and built to experimentally demonstrate, dynamic and steady state performance for the CLL-T SPRC are compared from the simulation studies.
Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole
2012-07-01
Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
1982-01-01
Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Algae Bioreactor Using Submerged Enclosures with Semi-Permeable Membranes
Trent, Jonathan D (Inventor); Gormly, Sherwin J (Inventor); Embaye, Tsegereda N (Inventor); Delzeit, Lance D (Inventor); Flynn, Michael T (Inventor); Liggett, Travis A (Inventor); Buckwalter, Patrick W (Inventor); Baertsch, Robert (Inventor)
2013-01-01
Methods for producing hydrocarbons, including oil, by processing algae and/or other micro-organisms in an aquatic environment. Flexible bags (e.g., plastic) with CO.sub.2/O.sub.2 exchange membranes, suspended at a controllable depth in a first liquid (e.g., seawater), receive a second liquid (e.g., liquid effluent from a "dead zone") containing seeds for algae growth. The algae are cultivated and harvested in the bags, after most of the second liquid is removed by forward osmosis through liquid exchange membranes. The algae are removed and processed, and the bags are cleaned and reused.
Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo
2010-01-01
We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...
Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.
2016-01-01
Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Kordy, M. A.; Wannamaker, P. E.; Maris, V.; Cherkaev, E.; Hill, G. J.
2014-12-01
We have developed an algorithm for 3D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permits incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used for the forward solution, parameter jacobians, and model update. The forward simulator, jacobians calculations, as well as synthetic and real data inversion are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequency or small material admittivity, the E-field requires divergence correction. Using Hodge decomposition, correction may be applied after the forward solution is calculated. It allows accurate E-field solutions in dielectric air. The system matrix factorization is computed using the MUMPS library, which shows moderately good scalability through 12 processor cores but limited gains beyond that. The factored matrix is used to calculate the forward response as well as the jacobians of field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure and several topographic models. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of electromagnetic waves normal to the slopes at high frequencies. Run time tests indicate that for meshes as large as 150x150x60 elements, MT forward response and jacobians can be calculated in ~2.5 hours per frequency. For inversion, we implemented data space Gauss-Newton method, which offers reduction in memory requirement and a significant speedup of the parameter step versus model space approach. For dense matrix operations we use tiling approach of PLASMA library, which shows very good scalability. In synthetic
Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.
1997-01-01
The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
McCallum, Ethan
2011-01-01
It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.
Mizuno, T.; Kobayashi, T.; Takara, H.
2014-01-01
We demonstrate dense SDM transmission of 20-WDM multi-carrier PDM-32QAM signals over a 40-km 12-core x 3-mode fiber with 247.9-b/s/Hz spectral efficiency. Parallel MIMO equalization enables 21-ns DMD compensation with 61 TDE taps per subcarrier....
The parallel volume at large distances
Kampf, Jürgen
In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
The parallel volume at large distances
Kampf, Jürgen
In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
Jelatis, G.J.
1983-01-01
Third sound in superfluid helium four films has been investigated using two parallel-plate waveguides. These investigations led to the observation of fifth sound, a new mode of sound propagation. Both waveguides consisted of two parallel pieces of vitreous quartz. The sound speed was obtained by measuring the time-of-flight of pulsed third sound over a known distance. Investigations from 1.0-1.7K were possible with the use of superconducting bolometers, which measure the temperature component of the third sound wave. Observations were initially made with a waveguide having a plate separation fixed at five microns. Adiabatic third sound was measured in the geometry. Isothermal third sound was also observed, using the usual, single-substrate technique. Fifth sound speeds, calculated from the two-fluid theory of helium and the speeds of the two forms of third sound, agreed in size and temperature dependence with theoretical predictions. Nevertheless, only equivocal observations of fifth sound were made. As a result, the film-substrate interaction was examined, and estimates of the Kapitza conductance were made. Assuming the dominance of the effects of this conductance over those due to the ECEs led to a new expression for fifth sound. A reanalysis of the initial data was made, which contained no adjustable parameters. The observation of fifth sound was seen to be consistent with the existence of an anomalously low boundary conductance
Streaming for Functional Data-Parallel Languages
Madsen, Frederik Meisner
In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...
Shipley, Heath V.; Lange-Vagle, Daniel; Marchesini, Danilo; Brammer, Gabriel B.; Ferrarese, Laura; Stefanon, Mauro; Kado-Fong, Erin; Whitaker, Katherine E.; Oesch, Pascal A.; Feinstein, Adina D.; Labbé, Ivo; Lundgren, Britt; Martis, Nicholas; Muzzin, Adam; Nedkova, Kalina; Skelton, Rosalind; van der Wel, Arjen
2018-03-01
We present Hubble multi-wavelength photometric catalogs, including (up to) 17 filters with the Advanced Camera for Surveys and Wide Field Camera 3 from the ultra-violet to near-infrared for the Hubble Frontier Fields and associated parallels. We have constructed homogeneous photometric catalogs for all six clusters and their parallels. To further expand these data catalogs, we have added ultra-deep K S -band imaging at 2.2 μm from the Very Large Telescope HAWK-I and Keck-I MOSFIRE instruments. We also add post-cryogenic Spitzer imaging at 3.6 and 4.5 μm with the Infrared Array Camera (IRAC), as well as archival IRAC 5.8 and 8.0 μm imaging when available. We introduce the public release of the multi-wavelength (0.2–8 μm) photometric catalogs, and we describe the unique steps applied for the construction of these catalogs. Particular emphasis is given to the source detection band, the contamination of light from the bright cluster galaxies (bCGs), and intra-cluster light (ICL). In addition to the photometric catalogs, we provide catalogs of photometric redshifts and stellar population properties. Furthermore, this includes all the images used in the construction of the catalogs, including the combined models of bCGs and ICL, the residual images, segmentation maps, and more. These catalogs are a robust data set of the Hubble Frontier Fields and will be an important aid in designing future surveys, as well as planning follow-up programs with current and future observatories to answer key questions remaining about first light, reionization, the assembly of galaxies, and many more topics, most notably by identifying high-redshift sources to target.
Gevorkyan, A.S.; Abajyan, H.G.
2011-01-01
We have investigated the statistical properties of an ensemble of disordered 1D spatial spin chains (SSCs) of finite length, placed in an external field, with consideration of relaxation effects. The short-range interaction complex-classical Hamiltonian was first used for solving this problem. A system of recurrent equations is obtained on the nodes of the spin-chain lattice. An efficient mathematical algorithm is developed on the basis of these equations with consideration of the advanced Sylvester conditions which allow step by step construct a huge number of stable spin chains in parallel. The distribution functions of different parameters of spin-glass system are constructed from the first principles of the complex classical mechanics by analyzing the calculation results of the 1D SSCs ensemble. It is shown that the behavior of the parameter distributions is quite different depending on the external fields. The energy ensembles and constants of spin-spin interactions are changed smoothly depending on the external field in the limit of statistical equilibrium, while some of them such as the mean value of polarizations of ensemble and parameters of its orderings are frustrated. We have also studied some critical properties of the ensemble of such catastrophes in the Clausius-Mossotti equation depending on the value of the external field. We have shown that the generalized complex-classical approach excludes these catastrophes allowing one to organize continuous parallel computing on the whole region of values of the external field including critical points. A new representation of the partition function based on these investigations is suggested. As opposed to usual definition, this function is a complex one and its derivatives are everywhere defined, including critical points
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-01-01
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results
PDDP, A Data Parallel Programming Model
Karen H. Warren
1996-01-01
Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
Nakamura, M; Kitayama, K
1998-05-10
Optical space code-division multiple access is a scheme to multiplex and link data between two-dimensional processors such as smart pixels and spatial light modulators or arrays of optical sources like vertical-cavity surface-emitting lasers. We examine the multiplexing characteristics of optical space code-division multiple access by using optical orthogonal signature patterns. The probability density function of interference noise in interfering optical orthogonal signature patterns is calculated. The bit-error rate is derived from the result and plotted as a function of receiver threshold, code length, code weight, and number of users. Furthermore, we propose a prethresholding method to suppress the interference noise, and we experimentally verify that the method works effectively in improving system performance.
Fringe Capacitance of a Parallel-Plate Capacitor.
Hale, D. P.
1978-01-01
Describes an experiment designed to measure the forces between charged parallel plates, and determines the relationship among the effective electrode area, the measured capacitance values, and the electrode spacing of a parallel plate capacitor. (GA)
Streaming nested data parallelism on multicores
Madsen, Frederik Meisner; Filinski, Andrzej
2016-01-01
The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed...
Parallel Polarization State Generation.
She, Alan; Capasso, Federico
2016-05-17
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Parallel algorithms for mapping pipelined and parallel computations
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Secoond order parallel tensors on some paracontact manifolds | Liu ...
The object of the present paper is to study the symmetric and skewsymmetric properties of a second order parallel tensor on paracontact metric (k;μ)- spaces and almost β-para-Kenmotsu (k;μ)-spaces. In this paper, we prove that if there exists a second order symmetric parallel tensor on a paracontact metric (k;μ)- space M, ...
On synchronous parallel computations with independent probabilistic choice
Reif, J.H.
1984-01-01
This paper introduces probabilistic choice to synchronous parallel machine models; in particular parallel RAMs. The power of probabilistic choice in parallel computations is illustrate by parallelizing some known probabilistic sequential algorithms. The authors characterize the computational complexity of time, space, and processor bounded probabilistic parallel RAMs in terms of the computational complexity of probabilistic sequential RAMs. They show that parallelism uniformly speeds up time bounded probabilistic sequential RAM computations by nearly a quadratic factor. They also show that probabilistic choice can be eliminated from parallel computations by introducing nonuniformity
Parallel imaging with phase scrambling.
Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel
2015-04-01
Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.
Morse, H Stephen
1994-01-01
Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Akl, Selim G
1985-01-01
Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Combinatorics of spreads and parallelisms
Johnson, Norman
2010-01-01
Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,
New algorithms for parallel MRI
Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A
2008-01-01
Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.
NonLinear Parallel OPtimization Tool, Phase II
National Aeronautics and Space Administration — The technological advancement proposed is a novel large-scale Noninear Parallel OPtimization Tool (NLPAROPT). This software package will eliminate the computational...
Introduction to parallel programming
Brawer, Steven
1989-01-01
Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Fox, Geoffrey C; Messina, Guiseppe C
2014-01-01
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Reconfigurable Parallel Computer Architectures for Space Applications
2012-08-07
63 B-1. Dependency diagram of the hardware blocks implemented with VHDL .................. 64 C-1. The...distribution is unlimited. The CU has been fully implemented in a FPGA using VHDL . The CU hardware design is depicted in Figure 12. It consists of a main...the hardware design implemented in the FPGA using VHDL . The block diagram shows the dependency of all the VHDL blocks included in the design. Each
Non-Cartesian parallel imaging reconstruction.
Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole
2014-11-01
Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel-In-Time For Moving Meshes
Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
CERN. Geneva
2016-01-01
The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
Parallelism in matrix computations
Gallopoulos, Efstratios; Sameh, Ahmed H
2016-01-01
This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
Sitchinava, Nodar; Zeh, Norbert
2012-01-01
We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Parallel Algorithms and Patterns
Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Application Portable Parallel Library
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Dominguez, A.; Siana, B.; Masters, D. [Department of Physics and Astronomy, University of California Riverside, Riverside, CA 92521 (United States); Henry, A. L.; Martin, C. L. [Department of Physics, University of California, Santa Barbara, CA 93106 (United States); Scarlata, C.; Bedregal, A. G. [Minnesota Institute for Astrophysics, University of Minnesota, Minneapolis, MN 55455 (United States); Malkan, M.; Ross, N. R. [Department of Physics and Astronomy, University of California Los Angeles, Los Angeles, CA 90095 (United States); Atek, H.; Colbert, J. W. [Spitzer Science Center, Caltech, Pasadena, CA 91125 (United States); Teplitz, H. I.; Rafelski, M. [Infrared Processing and Analysis Center, Caltech, Pasadena, CA 91125 (United States); McCarthy, P.; Hathi, N. P.; Dressler, A. [Observatories of the Carnegie Institution for Science, Pasadena, CA 91101 (United States); Bunker, A., E-mail: albertod@ucr.edu [Department of Physics, Oxford University, Denys Wilkinson Building, Keble Road, Oxford, OX1 3RH (United Kingdom)
2013-02-15
Spectroscopic observations of H{alpha} and H{beta} emission lines of 128 star-forming galaxies in the redshift range 0.75 {<=} z {<=} 1.5 are presented. These data were taken with slitless spectroscopy using the G102 and G141 grisms of the Wide-Field-Camera 3 (WFC3) on board the Hubble Space Telescope as part of the WFC3 Infrared Spectroscopic Parallel survey. Interstellar dust extinction is measured from stacked spectra that cover the Balmer decrement (H{alpha}/H{beta}). We present dust extinction as a function of H{alpha} luminosity (down to 3 Multiplication-Sign 10{sup 41} erg s{sup -1}), galaxy stellar mass (reaching 4 Multiplication-Sign 10{sup 8} M {sub Sun }), and rest-frame H{alpha} equivalent width. The faintest galaxies are two times fainter in H{alpha} luminosity than galaxies previously studied at z {approx} 1.5. An evolution is observed where galaxies of the same H{alpha} luminosity have lower extinction at higher redshifts, whereas no evolution is found within our error bars with stellar mass. The lower H{alpha} luminosity galaxies in our sample are found to be consistent with no dust extinction. We find an anti-correlation of the [O III] {lambda}5007/H{alpha} flux ratio as a function of luminosity where galaxies with L {sub H{alpha}} < 5 Multiplication-Sign 10{sup 41} erg s{sup -1} are brighter in [O III] {lambda}5007 than H{alpha}. This trend is evident even after extinction correction, suggesting that the increased [O III] {lambda}5007/H{alpha} ratio in low-luminosity galaxies is likely due to lower metallicity and/or higher ionization parameters.
Building a parallel file system simulator
Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J
2009-01-01
Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
Parallel discrete event simulation
Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.
1991-01-01
In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallel reservoir simulator computations
Hemanth-Kumar, K.; Young, L.C.
1995-01-01
The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
Totally parallel multilevel algorithms
Frederickson, Paul O.
1988-01-01
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Massively parallel mathematical sieves
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Trembach, Vera
2014-01-01
Space is an introduction to the mysteries of the Universe. Included are Task Cards for independent learning, Journal Word Cards for creative writing, and Hands-On Activities for reinforcing skills in Math and Language Arts. Space is a perfect introduction to further research of the Solar System.
Miquel, J. (Editor); Economos, A. C. (Editor)
1982-01-01
Presentations are given which address the effects of space flght on the older person, the parallels between the physiological responses to weightlessness and the aging process, and experimental possibilities afforded by the weightless environment to fundamental research in gerontology and geriatrics.
Algorithms for parallel computers
Churchhouse, R.F.
1985-01-01
Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
Parallelism and array processing
Zacharov, V.
1983-01-01
Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
Parallel magnetic resonance imaging
Larkman, David J; Nunes, Rita G
2007-01-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)
Simulation Exploration through Immersive Parallel Planes: Preprint
Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve
2016-03-01
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Simulation Exploration through Immersive Parallel Planes
Brunhart-Lupo, Nicholas J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Bush, Brian W [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Gruchalla, Kenny M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Smith, Steve [Los Alamos Visualization Associates
2017-05-25
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
The STAPL Parallel Graph Library
Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence
2013-01-01
This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable
Massively parallel multicanonical simulations
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
A parallel nearly implicit time-stepping scheme
Botchev, Mike A.; van der Vorst, Henk A.
2001-01-01
Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable. The purpose of this article is to give an overall assessment of the pa...
K.I.S.S. Parallel Coding (lecture 2)
CERN. Geneva
2018-01-01
K.I.S.S.ing parallel computing means, finally, loving it. Parallel computing will be approached in a theoretical and experimental way, using the most advanced and used C API: OpenMP. OpenMP is an open source project constantly developed and updated to hide the awful complexity of parallel coding in an awesome interface. The result is a tool which leaves plenty of space for clever solutions and terrific results in terms of efficiency and performance maximisation.
SPINning parallel systems software
Matlin, O.S.; Lusk, E.; McCune, W.
2002-01-01
We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin
Parallel programming with Python
Palach, Jan
2014-01-01
A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
PKA and Epac1 regulate endothelial integrity and migration through parallel and independent pathways
Lorenowicz, Magdalena J.; Fernandez-Borja, Mar; Kooistra, Matthijs R. H.; Bos, Johannes L.; Hordijk, Peter L.
2008-01-01
The vascular endothelium provides a semi-permeable barrier, which restricts the passage Of fluid, macromolecules and cells to the surrounding tissues. Cyclic AMP promotes endothelial barrier function and protects the endothelium against pro-inflammatory mediators. This study analyzed the relative
Towards a streaming model for nested data parallelism
Madsen, Frederik Meisner; Filinski, Andrzej
2013-01-01
The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...
A PARALLEL EXTENSION OF THE UAL ENVIRONMENT
MALITSKY, N.; SHISHLO, A.
2001-01-01
The deployment of the Unified Accelerator Library (UAL) environment on the parallel cluster is presented. The approach is based on the Message-Passing Interface (MPI) library and the Perl adapter that allows one to control and mix together the existing conventional UAL components with the new MPI-based parallel extensions. In the paper, we provide timing results and describe the application of the new environment to the SNS Ring complex beam dynamics studies, particularly, simulations of several physical effects, such as space charge, field errors, fringe fields, and others
Advances in randomized parallel computing
Rajasekaran, Sanguthevar
1999-01-01
The technique of randomization has been employed to solve numerous prob lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...
Expressing Parallelism with ROOT
Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab
2017-11-22
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Parallel Fast Legendre Transform
Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.
1998-01-01
We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were
Practical parallel programming
Bauer, Barr E
2014-01-01
This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
Parallel hierarchical radiosity rendering
Carter, Michael [Iowa State Univ., Ames, IA (United States)
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Parallel universes beguile science
2007-01-01
A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.
2017-04-04
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
Gardes, D.; Volkov, P.
1981-01-01
A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr
Parallel hierarchical global illumination
Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
Parallel Nonlinear Optimization for Astrodynamic Navigation, Phase I
National Aeronautics and Space Administration — CU Aerospace proposes the development of a new parallel nonlinear program (NLP) solver software package. NLPs allow the solution of complex optimization problems,...
Visual Interfaces for Parallel Simulations (VIPS), Phase I
National Aeronautics and Space Administration — Configuring the 3D geometry and physics of large scale parallel physics simulations is increasingly complex. Given the investment in time and effort to run these...
Plyukhin, A.V.
2013-01-01
A model of an autonomous isothermal Brownian motor with an internal propulsion mechanism is considered. The motor is a Brownian particle which is semi-transparent for molecules of surrounding ideal gas. Molecular passage through the particle is controlled by a potential similar to that in the transition rate theory, i.e. characterized by two stationary states with a finite energy difference separated by a potential barrier. The internal potential drop maintains the diode-like asymmetry of molecular fluxes through the particle, which results in the particle's stationary drift.
Frade, P R; Reyes-Nivia, M C; Faria, J; Kaandorp, J A; Luttikhuizen, P C; Bak, R P M
2010-12-01
Introgressive hybridization is described in several phylogenetic studies of mass-spawning corals. However, the prevalence of this process among brooding coral species is unclear. We used a mitochondrial (mtDNA: nad5) and two nuclear (nDNA: ATPSα and SRP54) intron markers to explore species barriers in the coral genus Madracis and address the role of hybridization in brooding systems. Specimens of six Caribbean Madracis morphospecies were collected from 5 to 60 m depth at Buoy One, Curaçao, supplemented by samples from Aruba, Trinidad & Tobago and Bermuda. Polymerase chain reaction and denaturing gradient gel electrophoresis were coupled to detect distinct alleles within single colonies. The recurrent nDNA phylogenetic non-monophyly among taxa is only challenged by Madracis senaria, the single monophyletic species within the genus. nDNA AMOVAs indicated overall statistical divergence (0.1% significance level) among species but pairwise comparisons of genetic differentiation revealed some gene exchange between Madracis taxa. mtDNA sequences clustered in two main groups representing typical shallow and deep water Madracis species. Madracis pharensis shallow and deep colonies (with threshold at about 23-24 m) clustered in different mtDNA branches, together with their depth-sympatric congenerics. This divergence was repeated for the nDNA (ATPSα) suggestive of distinct M. pharensis depth populations. These matched the vertical distribution of the dinoflagellate symbionts hosted by M. pharensis, with Symbiodinium ITS2 type B7 in the shallows but type B15 in the deep habitats, suggesting symbiont-related disruptive selection. Recurrent non-monophyly of Madracis taxa and high levels of shared polymorphism reflected in ambiguous phylogenetic networks indicate that hybridization is likely to have played a role in the evolution of the genus. Using coalescent forward-in-time simulations, lineage sorting alone was rejected as an explanation to the SRP54 genetic variation contained in Madracis mirabilis and Madracis decactis (species with an old fossil record), showing that introgressive hybridization has taken place between these species, either directly or through the gene pool of other Madracis taxa. Madracis widespread non-monophyly and the absence of statistical divergence between some species suggest that introgressive hybridization plays an important role in the evolution of the genus. Different reproductive traits and symbiont signatures of taxa forming distinct genetic clusters also point to the same conclusion. We suggest that Madracis morphospecies remain recognizable because introgressive hybridization is non-pervasive and/or because disruptive selection is in action. Copyright © 2010 Elsevier Inc. All rights reserved.
Growth of single crystals from solutions using semi-permeable membranes
Varkey, A. J.; Okeke, C. E.
1983-05-01
A technique suitable for growth of single crystals from solutions using semi-preamble membranes is described. Using this technique single crystals of copper sulphate, potassium bromide and ammonium dihydrogen phosphate have been successfully grown. Advantages of this technique over other methods are discussed.
Bader, S.; Kooi, H.
2005-01-01
Theories of osmosis in groundwater flow are increasingly used to explain anomalies of salinity in clayey environments. However, predictive modelling through mathematical analysis can hardly be found in literature. In this paper, a model of chemical osmosis based on non-equilibrium thermodynamics, is
Littoral Hydrodynamics and Sediment Transport Around a Semi-Permeable Breakwater
2015-09-18
Australasian Coasts & Ports Conference 2015 15 - 18 September 2015, Auckland , New Zealand Li, H et al. Littoral Hydrodynamics and Sediment...Coasts and Ports 2015, Auckland , New Zealand, 15-18 September, 2015, 7 pp. Littoral Hydrodynamics and Sediment Transport Around a Semi...Conference 2015 15 - 18 September 2015, Auckland , New Zealand Li, H et al. Littoral Hydrodynamics and Sediment Transport 2 The bathymetric and side
Parallel optoelectronic trinary signed-digit division
Alam, Mohammad S.
1999-03-01
The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert
of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...
PARALLEL MOVING MECHANICAL SYSTEMS
Florian Ion Tiberius Petrescu
2014-09-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.
Xyce parallel electronic simulator.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Betchov, R
2012-01-01
Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation
Algorithmically specialized parallel computers
Snyder, Lawrence; Gannon, Dennis B
1985-01-01
Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
Resistor Combinations for Parallel Circuits.
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS
M. K. Bouza
2017-01-01
Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
Parallel External Memory Graph Algorithms
Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari
2010-01-01
In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Parallel inter channel interaction mechanisms
Jovic, V.; Afgan, N.; Jovic, L.
1995-01-01
Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)
A Parallel Butterfly Algorithm
Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing
2014-01-01
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
A Parallel Butterfly Algorithm
Poulson, Jack
2014-02-04
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
Fast parallel event reconstruction
CERN. Geneva
2010-01-01
On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer. Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...
DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Parallel imaging microfluidic cytometer.
Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.
Design strategies for irregularly adapting parallel applications
Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal
2000-01-01
Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Parallel Monte Carlo Search for Hough Transform
Lopes, Raul H. C.; Franqueira, Virginia N. L.; Reid, Ivan D.; Hobson, Peter R.
2017-10-01
We investigate the problem of line detection in digital image processing and in special how state of the art algorithms behave in the presence of noise and whether CPU efficiency can be improved by the combination of a Monte Carlo Tree Search, hierarchical space decomposition, and parallel computing. The starting point of the investigation is the method introduced in 1962 by Paul Hough for detecting lines in binary images. Extended in the 1970s to the detection of space forms, what came to be known as Hough Transform (HT) has been proposed, for example, in the context of track fitting in the LHC ATLAS and CMS projects. The Hough Transform transfers the problem of line detection, for example, into one of optimization of the peak in a vote counting process for cells which contain the possible points of candidate lines. The detection algorithm can be computationally expensive both in the demands made upon the processor and on memory. Additionally, it can have a reduced effectiveness in detection in the presence of noise. Our first contribution consists in an evaluation of the use of a variation of the Radon Transform as a form of improving theeffectiveness of line detection in the presence of noise. Then, parallel algorithms for variations of the Hough Transform and the Radon Transform for line detection are introduced. An algorithm for Parallel Monte Carlo Search applied to line detection is also introduced. Their algorithmic complexities are discussed. Finally, implementations on multi-GPU and multicore architectures are discussed.
A Parallel Saturation Algorithm on Shared Memory Architectures
Ezekiel, Jonathan; Siminiceanu
2007-01-01
Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Parallel Framework for Cooperative Processes
Mitică Craus
2005-01-01
Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Parallel Monte Carlo reactor neutronics
Blomquist, R.N.; Brown, F.B.
1994-01-01
The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
Circuit and bond polytopes on series–parallel graphs
Borne , Sylvie; Fouilhoux , Pierre; Grappe , Roland; Lacroix , Mathieu; Pesneau , Pierre
2015-01-01
International audience; In this paper, we describe the circuit polytope on series–parallel graphs. We first show the existence of a compact extended formulation. Though not being explicit, its construction process helps us to inductively provide the description in the original space. As a consequence, using the link between bonds and circuits in planar graphs, we also describe the bond polytope on series–parallel graphs.
Large amplitude parallel propagating electromagnetic oscillitons
Cattaert, Tom; Verheest, Frank
2005-01-01
Earlier systematic nonlinear treatments of parallel propagating electromagnetic waves have been given within a fluid dynamic approach, in a frame where the nonlinear structures are stationary and various constraining first integrals can be obtained. This has lead to the concept of oscillitons that has found application in various space plasmas. The present paper differs in three main aspects from the previous studies: first, the invariants are derived in the plasma frame, as customary in the Sagdeev method, thus retaining in Maxwell's equations all possible effects. Second, a single differential equation is obtained for the parallel fluid velocity, in a form reminiscent of the Sagdeev integrals, hence allowing a fully nonlinear discussion of the oscilliton properties, at such amplitudes as the underlying Mach number restrictions allow. Third, the transition to weakly nonlinear whistler oscillitons is done in an analytical rather than a numerical fashion
Computation and parallel implementation for early vision
Gualtieri, J. Anthony
1990-01-01
The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.
Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.
2015-01-01
about 6.1 °C when the TFO strand was modified with Z and the Watson-Crick strand with adenine-LNA (AL). The molecular modeling results showed that, in case of nucleobases Y and Z a hydrogen bond (1.69 and 1.72 Å, respectively) was formed between the protonated 3-aminopropyn-1-yl chain and one...... of the phosphate groups in Watson-Crick strand. Also, it was shown that the nucleobase Y made a good stacking and binding with the other nucleobases in the TFO and Watson-Crick duplex, respectively. In contrast, the nucleobase Z with LNA moiety was forced to twist out of plane of Watson-Crick base pair which......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...
Parallel consensual neural networks.
Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H
1997-01-01
A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.
A Parallel Particle Swarm Optimizer
Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D
2003-01-01
.... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...
Patterns for Parallel Software Design
Ortega-Arjona, Jorge Luis
2010-01-01
Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin
Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo
2013-01-01
a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...
Parallel processing of two-dimensional Sn transport calculations
Uematsu, M.
1997-01-01
A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation
Kinematics analysis and simulation of a new underactuated parallel robot
Wenxu YAN
2017-04-01
Full Text Available The number of degrees of freedom is equal to the number of the traditional robot driving motors, which causes defects such as low efficiency. To overcome that problem, based on the traditional parallel robot, a new underactuated parallel robot is presented. The structure characteristics and working principles of the underactuated parallel robot are analyzed. The forward and inverse solutions are derived by way of space analytic geometry and vector algebra. The kinematics model is established, and MATLAB is implied to verify the accuracy of forward and inverse solutions and identify the optimal work space. The simulation results show that the robot can realize the function of robot switch with three or four degrees of freedom when the number of driving motors is three, improving the efficiency of robot grasping, with the characteristics of large working space, high speed operation, high positioning accuracy, low manufacturing cost and so on, and it will have a wide range of industrial applications.
Parallel 3-D method of characteristics in MPACT
Kochunas, B.; Dovvnar, T. J.; Liu, Z.
2013-01-01
A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)
PARALLEL IMPORT: REALITY FOR RUSSIA
Т. А. Сухопарова
2014-01-01
Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now
The Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallelization of the FLAPW method
Canning, A.; Mannstadt, W.; Freeman, A.J.
1999-01-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer
Parallelization of the FLAPW method
Canning, A.; Mannstadt, W.; Freeman, A. J.
2000-08-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Compressing Data Cube in Parallel OLAP Systems
Frank Dehne
2007-03-01
Full Text Available This paper proposes an efficient algorithm to compress the cubes in the progress of the parallel data cube generation. This low overhead compression mechanism provides block-by-block and record-by-record compression by using tuple difference coding techniques, thereby maximizing the compression ratio and minimizing the decompression penalty at run-time. The experimental results demonstrate that the typical compression ratio is about 30:1 without sacrificing running time. This paper also demonstrates that the compression method is suitable for Hilbert Space Filling Curve, a mechanism widely used in multi-dimensional indexing.
Automatic parallelization of while-Loops using speculative execution
Collard, J.F.
1995-01-01
Automatic parallelization of imperative sequential programs has focused on nests of for-loops. The most recent of them consist in finding an affine mapping with respect to the loop indices to simultaneously capture the temporal and spatial properties of the parallelized program. Such a mapping is usually called a open-quotes space-time transformation.close quotes This work describes an extension of these techniques to while-loops using speculative execution. We show that space-time transformations are a good framework for summing up previous restructuration techniques of while-loop, such as pipelining. Moreover, we show that these transformations can be derived and applied automatically
Is Monte Carlo embarrassingly parallel?
Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)
2012-07-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Is Monte Carlo embarrassingly parallel?
Hoogenboom, J. E.
2012-01-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Parallel integer sorting with medium and fine-scale parallelism
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Parallel education: what is it?
Amos, Michelle Peta
2017-01-01
In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...
Balanced, parallel operation of flashlamps
Carder, B.M.; Merritt, B.T.
1979-01-01
A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests
Parallel computing in plasma physics: Nonlinear instabilities
Pohn, E.; Kamelander, G.; Shoucri, M.
2000-01-01
A Vlasov-Poisson-system is used for studying the time evolution of the charge-separation at a spatial one- as well as a two-dimensional plasma-edge. Ions are advanced in time using the Vlasov-equation. The whole three-dimensional velocity-space is considered leading to very time-consuming four-resp. five-dimensional fully kinetic simulations. In the 1D simulations electrons are assumed to behave adiabatic, i.e. they are Boltzmann-distributed, leading to a nonlinear Poisson-equation. In the 2D simulations a gyro-kinetic approximation is used for the electrons. The plasma is assumed to be initially neutral. The simulations are performed at an equidistant grid. A constant time-step is used for advancing the density-distribution function in time. The time-evolution of the distribution function is performed using a splitting scheme. Each dimension (x, y, υ x , υ y , υ z ) of the phase-space is advanced in time separately. The value of the distribution function for the next time is calculated from the value of an - in general - interstitial point at the present time (fractional shift). One-dimensional cubic-spline interpolation is used for calculating the interstitial function values. After the fractional shifts are performed for each dimension of the phase-space, a whole time-step for advancing the distribution function is finished. Afterwards the charge density is calculated, the Poisson-equation is solved and the electric field is calculated before the next time-step is performed. The fractional shift method sketched above was parallelized for p processors as follows. Considering first the shifts in y-direction, a proper parallelization strategy is to split the grid into p disjoint υ z -slices, which are sub-grids, each containing a different 1/p-th part of the υ z range but the whole range of all other dimensions. Each processor is responsible for performing the y-shifts on a different slice, which can be done in parallel without any communication between
Workspace Analysis for Parallel Robot
Ying Sun
2013-05-01
Full Text Available As a completely new-type of robot, the parallel robot possesses a lot of advantages that the serial robot does not, such as high rigidity, great load-carrying capacity, small error, high precision, small self-weight/load ratio, good dynamic behavior and easy control, hence its range is extended in using domain. In order to find workspace of parallel mechanism, the numerical boundary-searching algorithm based on the reverse solution of kinematics and limitation of link length has been introduced. This paper analyses position workspace, orientation workspace of parallel robot of the six degrees of freedom. The result shows: It is a main means to increase and decrease its workspace to change the length of branch of parallel mechanism; The radius of the movement platform has no effect on the size of workspace, but will change position of workspace.
"Feeling" Series and Parallel Resistances.
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Parallel encoders for pixel detectors
Nikityuk, N.M.
1991-01-01
A new method of fast encoding and determining the multiplicity and coordinates of fired pixels is described. A specific example construction of parallel encodes and MCC for n=49 and t=2 is given. 16 refs.; 6 figs.; 2 tabs
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Event monitoring of parallel computations
Gruzlikov Alexander M.
2015-06-01
Full Text Available The paper considers the monitoring of parallel computations for detection of abnormal events. It is assumed that computations are organized according to an event model, and monitoring is based on specific test sequences
Massively Parallel Finite Element Programming
Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang
2010-01-01
Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
The STAPL Parallel Graph Library
Harshvardhan,
2013-01-01
This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.
Writing parallel programs that work
CERN. Geneva
2012-01-01
Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...
Exploiting Symmetry on Parallel Architectures.
Stiller, Lewis Benjamin
1995-01-01
This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Parallel algorithms for continuum dynamics
Hicks, D.L.; Liebrock, L.M.
1987-01-01
Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Comparative eye-tracking evaluation of scatterplots and parallel coordinates
Rudolf Netzel
2017-06-01
Full Text Available We investigate task performance and reading characteristics for scatterplots (Cartesian coordinates and parallel coordinates. In a controlled eye-tracking study, we asked 24 participants to assess the relative distance of points in multidimensional space, depending on the diagram type (parallel coordinates or a horizontal collection of scatterplots, the number of data dimensions (2, 4, 6, or 8, and the relative distance between points (15%, 20%, or 25%. For a given reference point and two target points, we instructed participants to choose the target point that was closer to the reference point in multidimensional space. We present a visual scanning model that describes different strategies to solve this retrieval task for both diagram types, and propose corresponding hypotheses that we test using task completion time, accuracy, and gaze positions as dependent variables. Our results show that scatterplots outperform parallel coordinates significantly in 2 dimensions, however, the task was solved more quickly and more accurately with parallel coordinates in 8 dimensions. The eye-tracking data further shows significant differences between Cartesian and parallel coordinates, as well as between different numbers of dimensions. For parallel coordinates, there is a clear trend toward shorter fixations and longer saccades with increasing number of dimensions. Using an area-of-interest (AOI based approach, we identify different reading strategies for each diagram type: For parallel coordinates, the participants’ gaze frequently jumped back and forth between pairs of axes, while axes were rarely focused on when viewing Cartesian coordinates. We further found that participants’ attention is biased: toward the center of the whole plotfor parallel coordinates and skewed to the center/left side for Cartesian coordinates. We anticipate that these results may support the design of more effective visualizations for multidimensional data.
Modeling and Control of Primary Parallel Isolated Boost Converter
Mira Albert, Maria del Carmen; Hernandez Botella, Juan Carlos; Sen, Gökhan
2012-01-01
In this paper state space modeling and closed loop controlled operation have been presented for primary parallel isolated boost converter (PPIBC) topology as a battery charging unit. Parasitic resistances have been included to have an accurate dynamic model. The accuracy of the model has been...
Some aspects of radial flow between parallel disks
Tabatabai, M.; Pollard, A.
1985-01-01
Radial flow of air between two closely spaced parallel disks is examined experimentally. A comprehensive review of the previous work performed on similar flow situations is given by Tabatabai and Pollard. The present paper is a discussion of some of the results obtained so far and offers some observations on the decay of turbulence in this flow. (author)
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel Implicit Algorithms for CFD
Keyes, David E.
1998-01-01
The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
Second derivative parallel block backward differentiation type ...
Second derivative parallel block backward differentiation type formulas for Stiff ODEs. ... Log in or Register to get access to full text downloads. ... and the methods are inherently parallel and can be distributed over parallel processors. They are ...
A Parallel Approach to Fractal Image Compression
Lubomir Dedera
2004-01-01
The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.
Parallelization Issues and Particle-In Codes.
Elster, Anne Cathrine
1994-01-01
"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on
Parallel fabrication of macroporous scaffolds.
Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal
2018-07-01
Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.
Parallel plasma fluid turbulence calculations
Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.
1994-01-01
The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Evaluating parallel optimization on transputers
A.G. Chalmers
2003-12-01
Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.
Pattern-Driven Automatic Parallelization
Christoph W. Kessler
1996-01-01
Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
Parallel artificial liquid membrane extraction
Gjelstad, Astrid; Rasmussen, Knut Einar; Parmer, Marthe Petrine
2013-01-01
This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated by an arti......This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated...... by an artificial liquid membrane. Parallel artificial liquid membrane extraction is a modification of hollow-fiber liquid-phase microextraction, where the hollow fibers are replaced by flat membranes in a 96-well plate format....
Cellular automata a parallel model
Mazoyer, J
1999-01-01
Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.
Options for Parallelizing a Planning and Scheduling Algorithm
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.
2011-01-01
Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Badlands: A parallel basin and landscape dynamics model
T. Salles
2016-01-01
Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.
Numerical simulation of Vlasov equation with parallel tools
Peyroux, J.
2005-11-01
This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Many-Body Mean-Field Equations: Parallel implementation
Vallieres, M.; Umar, S.; Chinn, C.; Strayer, M.
1993-01-01
We describe the implementation of Hartree-Fock Many-Body Mean-Field Equations on a Parallel Intel iPSC/860 hypercube. We first discuss the Nuclear Mean-Field approach in physical terms. Then we describe our parallel implementation of this approach on the Intel iPSC/860 hypercube. We discuss and compare the advantages and disadvantages of the domain partition versus the Hilbert space partition for this problem. We conclude by discussing some timing experiments on various computing platforms
Parallel Sparse Matrix - Vector Product
Alexandersen, Joe; Lazarov, Boyan Stefanov; Dammann, Bernd
This technical report contains a case study of a sparse matrix-vector product routine, implemented for parallel execution on a compute cluster with both pure MPI and hybrid MPI-OpenMP solutions. C++ classes for sparse data types were developed and the report shows how these class can be used...
[Falsified medicines in parallel trade].
Muckenfuß, Heide
2017-11-01
The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.
The parallel adult education system
Wahlgren, Bjarne
2015-01-01
for competence development. The Danish university educational system includes two parallel programs: a traditional academic track (candidatus) and an alternative practice-based track (master). The practice-based program was established in 2001 and organized as part time. The total program takes half the time...
Where are the parallel algorithms?
Voigt, R. G.
1985-01-01
Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.
Default Parallels Plesk Panel Page
services that small businesses want and need. Our software includes key building blocks of cloud service virtualized servers Service Provider Products ParallelsÂ® Automation Hosting, SaaS, and cloud computing , the leading hosting automation software. You see this page because there is no Web site at this
Parallel plate transmission line transformer
Voeten, S.J.; Brussaard, G.J.H.; Pemen, A.J.M.
2011-01-01
A Transmission Line Transformer (TLT) can be used to transform high-voltage nanosecond pulses. These transformers rely on the fact that the length of the pulse is shorter than the transmission lines used. This allows connecting the transmission lines in parallel at the input and in series at the
Matpar: Parallel Extensions for MATLAB
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
Massively parallel quantum computer simulator
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray
Massively parallel Fokker-Planck calculations
Mirin, A.A.
1990-01-01
This paper reports that the Fokker-Planck package FPPAC, which solves the complete nonlinear multispecies Fokker-Planck collision operator for a plasma in two-dimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Cray-optimized algorithms with ones suitable for a massively parallel architecture. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2. For large problem size, the Connection Machine 2 is found to be cost-efficient
Parallel computing: numerics, applications, and trends
Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter
2009-01-01
... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...
Experiments with parallel algorithms for combinatorial problems
G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens
1985-01-01
textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines
Heggarty, J.W.
1999-06-01
For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
Nam, Sung Sik; Alouini, Mohamed-Slim; Ko, Young-Chai
2018-01-01
In this paper, we statistically analyze the performance of a threshold-based parallel multiple beam selection scheme for a free-space optical (FSO) based system with wavelength division multiplexing (WDM) in cases where a pointing error has occurred
Building Blocks for the Rapid Development of Parallel Simulations, Phase I
National Aeronautics and Space Administration — Scientists need to be able to quickly develop and run parallel simulations without paying the high price of writing low-level message passing codes using compiled...
Parallel family trees for transfer matrices in the Potts model
Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo
2015-02-01
The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster
Parallel trajectory similarity joins in spatial networks
Shang, Shuo
2018-04-04
The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.
Parallel trajectory similarity joins in spatial networks
Shang, Shuo; Chen, Lisi; Wei, Zhewei; Jensen, Christian S.; Zheng, Kai; Kalnis, Panos
2018-01-01
The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.
The numerical parallel computing of photon transport
Huang Qingnan; Liang Xiaoguang; Zhang Lifa
1998-12-01
The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing
Mustafa Basthikodi
2016-04-01
Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors
Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael
2008-01-01
about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks....... In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs....
Parallel grid generation algorithm for distributed memory computers
Moitra, Stuti; Moitra, Anutosh
1994-01-01
A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
A novel two-level dynamic parallel data scheme for large 3-D SN calculations
Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.
2005-01-01
We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
Structural synthesis of parallel robots
Gogu, Grigore
This book represents the fifth part of a larger work dedicated to the structural synthesis of parallel robots. The originality of this work resides in the fact that it combines new formulae for mobility, connectivity, redundancy and overconstraints with evolutionary morphology in a unified structural synthesis approach that yields interesting and innovative solutions for parallel robotic manipulators. This is the first book on robotics that presents solutions for coupled, decoupled, uncoupled, fully-isotropic and maximally regular robotic manipulators with Schönflies motions systematically generated by using the structural synthesis approach proposed in Part 1. Overconstrained non-redundant/overactuated/redundantly actuated solutions with simple/complex limbs are proposed. Many solutions are presented here for the first time in the literature. The author had to make a difficult and challenging choice between protecting these solutions through patents and releasing them directly into the public domain. T...
GPU Parallel Bundle Block Adjustment
ZHENG Maoteng
2017-09-01
Full Text Available To deal with massive data in photogrammetry, we introduce the GPU parallel computing technology. The preconditioned conjugate gradient and inexact Newton method are also applied to decrease the iteration times while solving the normal equation. A brand new workflow of bundle adjustment is developed to utilize GPU parallel computing technology. Our method can avoid the storage and inversion of the big normal matrix, and compute the normal matrix in real time. The proposed method can not only largely decrease the memory requirement of normal matrix, but also largely improve the efficiency of bundle adjustment. It also achieves the same accuracy as the conventional method. Preliminary experiment results show that the bundle adjustment of a dataset with about 4500 images and 9 million image points can be done in only 1.5 minutes while achieving sub-pixel accuracy.
A tandem parallel plate analyzer
Hamada, Y.; Fujisawa, A.; Iguchi, H.; Nishizawa, A.; Kawasumi, Y.
1996-11-01
By a new modification of a parallel plate analyzer the second-order focus is obtained in an arbitrary injection angle. This kind of an analyzer with a small injection angle will have an advantage of small operational voltage, compared to the Proca and Green analyzer where the injection angle is 30 degrees. Thus, the newly proposed analyzer will be very useful for the precise energy measurement of high energy particles in MeV range. (author)
Gus'kov, B.N.; Kalinnikov, V.A.; Krastev, V.R.; Maksimov, A.N.; Nikityuk, N.M.
1985-01-01
This paper describes a high-speed parallel counter that contains 31 inputs and 15 outputs and is implemented by integrated circuits of series 500. The counter is designed for fast sampling of events according to the number of particles that pass simultaneously through the hodoscopic plane of the detector. The minimum delay of the output signals relative to the input is 43 nsec. The duration of the output signals can be varied from 75 to 120 nsec
An anthropologist in parallel structure
Noelle Molé Liston
2016-08-01
Full Text Available The essay examines the parallels between Molé Liston’s studies on labor and precarity in Italy and the United States’ anthropology job market. Probing the way economic shift reshaped the field of anthropology of Europe in the late 2000s, the piece explores how the neoliberalization of the American academy increased the value in studying the hardships and daily lives of non-western populations in Europe.
Wakefield calculations on parallel computers
Schoessow, P.
1990-01-01
The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs
Aspects of computation on asynchronous parallel processors
Wright, M.
1989-01-01
The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
Parallel processing of genomics data
Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario
2016-10-01
The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.
1996-06-01
A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.
Arrigoni, Roberto; Benzoni, Francesca; Terraneo, Tullia I; Caragnano, Annalisa; Berumen, Michael L
2016-10-07
Reticulate evolution, introgressive hybridisation, and phenotypic plasticity have been documented in scleractinian corals and have challenged our ability to interpret speciation processes. Stylophora is a key model system in coral biology and physiology, but genetic analyses have revealed that cryptic lineages concealed by morphological stasis exist in the Stylophora pistillata species complex. The Red Sea represents a hotspot for Stylophora biodiversity with six morphospecies described, two of which are regionally endemic. We investigated Stylophora species boundaries from the Red Sea and the associated Symbiodinium by sequencing seven DNA loci. Stylophora morphospecies from the Red Sea were not resolved based on mitochondrial phylogenies and showed nuclear allele sharing. Low genetic differentiation, weak isolation, and strong gene flow were found among morphospecies although no signals of genetic recombination were evident among them. Stylophora mamillata harboured Symbiodinium clade C whereas the other two Stylophora morphospecies hosted either Symbiodinium clade A or C. These evolutionary patterns suggest that either gene exchange occurs through reticulate evolution or that multiple ecomorphs of a phenotypically plastic species occur in the Red Sea. The recent origin of the lineage leading to the Red Sea Stylophora may indicate an ongoing speciation driven by environmental changes and incomplete lineage sorting.
Arrigoni, Roberto; Benzoni, Francesca; Terraneo, Tullia Isotta; Caragnano, Annalisa; Berumen, Michael L.
2016-01-01
Reticulate evolution, introgressive hybridisation, and phenotypic plasticity have been documented in scleractinian corals and have challenged our ability to interpret speciation processes. Stylophora is a key model system in coral biology and physiology, but genetic analyses have revealed that cryptic lineages concealed by morphological stasis exist in the Stylophora pistillata species complex. The Red Sea represents a hotspot for Stylophora biodiversity with six morphospecies described, two of which are regionally endemic. We investigated Stylophora species boundaries from the Red Sea and the associated Symbiodinium by sequencing seven DNA loci. Stylophora morphospecies from the Red Sea were not resolved based on mitochondrial phylogenies and showed nuclear allele sharing. Low genetic differentiation, weak isolation, and strong gene flow were found among morphospecies although no signals of genetic recombination were evident among them. Stylophora mamillata harboured Symbiodinium clade C whereas the other two Stylophora morphospecies hosted either Symbiodinium clade A or C. These evolutionary patterns suggest that either gene exchange occurs through reticulate evolution or that multiple ecomorphs of a phenotypically plastic species occur in the Red Sea. The recent origin of the lineage leading to the Red Sea Stylophora may indicate an ongoing speciation driven by environmental changes and incomplete lineage sorting.
Arrigoni, Roberto
2016-10-07
Reticulate evolution, introgressive hybridisation, and phenotypic plasticity have been documented in scleractinian corals and have challenged our ability to interpret speciation processes. Stylophora is a key model system in coral biology and physiology, but genetic analyses have revealed that cryptic lineages concealed by morphological stasis exist in the Stylophora pistillata species complex. The Red Sea represents a hotspot for Stylophora biodiversity with six morphospecies described, two of which are regionally endemic. We investigated Stylophora species boundaries from the Red Sea and the associated Symbiodinium by sequencing seven DNA loci. Stylophora morphospecies from the Red Sea were not resolved based on mitochondrial phylogenies and showed nuclear allele sharing. Low genetic differentiation, weak isolation, and strong gene flow were found among morphospecies although no signals of genetic recombination were evident among them. Stylophora mamillata harboured Symbiodinium clade C whereas the other two Stylophora morphospecies hosted either Symbiodinium clade A or C. These evolutionary patterns suggest that either gene exchange occurs through reticulate evolution or that multiple ecomorphs of a phenotypically plastic species occur in the Red Sea. The recent origin of the lineage leading to the Red Sea Stylophora may indicate an ongoing speciation driven by environmental changes and incomplete lineage sorting.
Parallel heat transport in integrable and chaotic magnetic fields
Castillo-Negrete, D. del; Chacon, L. [Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-8071 (United States)
2012-05-15
The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly challenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), {chi}{sub ||} , and the perpendicular, {chi}{sub Up-Tack }, conductivities ({chi}{sub ||} /{chi}{sub Up-Tack} may exceed 10{sup 10} in fusion plasmas); (ii) Nonlocal parallel transport in the limit of small collisionality; and (iii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates. Motivated by these issues, we present a Lagrangian Green's function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geometry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island), weakly chaotic (Devil's staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local parallel closures, is non-diffusive, thus casting doubts on the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.
Parallel Breadth-First Search on Distributed Memory Systems
Computational Research Division; Buluc, Aydin; Madduri, Kamesh
2011-04-15
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
Fast robot kinematics modeling by using a parallel simulator (PSIM)
El-Gazzar, H.M.; Ayad, N.M.A.
2002-01-01
High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done
Fast robot kinematics modeling by using a parallel simulator (PSIM)
El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)
2002-09-15
High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.
Modelling and parallel calculation of a kinetic boundary layer
Perlat, Jean Philippe
1998-01-01
This research thesis aims at addressing reliability and cost issues in the calculation by numeric simulation of flows in transition regime. The first step has been to reduce calculation cost and memory space for the Monte Carlo method which is known to provide performance and reliability for rarefied regimes. Vector and parallel computers allow this objective to be reached. Here, a MIMD (multiple instructions, multiple data) machine has been used which implements parallel calculation at different levels of parallelization. Parallelization procedures have been adapted, and results showed that parallelization by calculation domain decomposition was far more efficient. Due to reliability issue related to the statistic feature of Monte Carlo methods, a new deterministic model was necessary to simulate gas molecules in transition regime. New models and hyperbolic systems have therefore been studied. One is chosen which allows thermodynamic values (density, average velocity, temperature, deformation tensor, heat flow) present in Navier-Stokes equations to be determined, and the equations of evolution of thermodynamic values are described for the mono-atomic case. Numerical resolution of is reported. A kinetic scheme is developed which complies with the structure of all systems, and which naturally expresses boundary conditions. The validation of the obtained 14 moment-based model is performed on shock problems and on Couette flows [fr
Development of parallel Fokker-Planck code ALLAp
Batishcheva, A.A.; Sigmar, D.J.; Koniges, A.E.
1996-01-01
We report on our ongoing development of the 3D Fokker-Planck code ALLA for a highly collisional scrape-off-layer (SOL) plasma. A SOL with strong gradients of density and temperature in the spatial dimension is modeled. Our method is based on a 3-D adaptive grid (in space, magnitude of the velocity, and cosine of the pitch angle) and a second order conservative scheme. Note that the grid size is typically 100 x 257 x 65 nodes. It was shown in our previous work that only these capabilities make it possible to benchmark a 3D code against a spatially-dependent self-similar solution of a kinetic equation with the Landau collision term. In the present work we show results of a more precise benchmarking against the exact solutions of the kinetic equation using a new parallel code ALLAp with an improved method of parallelization and a modified boundary condition at the plasma edge. We also report first results from the code parallelization using Message Passing Interface for a Massively Parallel CRI T3D platform. We evaluate the ALLAp code performance versus the number of T3D processors used and compare its efficiency against a Work/Data Sharing parallelization scheme and a workstation version
A parallel implementation of 3D Zernike moment analysis
Berjón Díez, Daniel; Arnaldo Duart, Sergio; Morán Burgos, Francisco
2011-01-01
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3...
Behaviour of parallel girders stabilised with U-frames
Virdi, Kuldeep; Azzi, Walid
2010-01-01
Lateral torsional buckling is a key factor in the design of steel girders. Stability can be enhanced by cross-bracing, reducing the effective length and thus increasing the ultimate capacity. U-frames are an option often used to brace the girders when designing through type of bridges and where...... overhead bracing is not practical. This paper investigates the effect of the U-frame spacing on the stability of the parallel girders. Eigenvalue buckling analysis was undertaken with four different spacings of the U-frames. Results were extracted from finite element analysis, interpreted and conclusions...
Marginal Assessment of Crowns by the Aid of Parallel Radiography
Farnaz Fattahi
2015-03-01
Full Text Available Introduction: Marginal adaptation is the most critical item in long-term prognosis of single crowns. This study aimed to assess the marginal quality as well asthe discrepancies in marginal integrity of some PFM single crowns of posterior teeth by employing parallel radiography in Shiraz Dental School, Shiraz, Iran. Methods: In this descriptive study, parallel radiographies were taken from 200 fabricated PFM single crowns of posterior teeth after cementation and before discharging the patient. To calculate the magnification of the images, a metallic sphere with the thickness of 4 mm was placed in the direction of the crown margin on the occlusal surface. Thereafter, the horizontal and vertical space between the crown margins, the margin of preparations and also the vertical space between the crown margin and the bone crest were measured by using digital radiological software. Results: Analysis of data by descriptive statistics revealed that 75.5% and 60% of the cases had more than the acceptable space (50µm in the vertical (130±20µm and horizontal (90±15µm dimensions, respectively. Moreover, 85% of patients were found to have either horizontal or vertical gap. In 77% of cases, the margins of crowns invaded the biologic width in the mesial and 70% in distal surfaces. Conclusion: Parallel radiography can be expedient in the stage of framework try-in to yield some important information that cannot be obtained by routine clinical evaluations and may improve the treatment prognosis
Overview of the Force Scientific Parallel Language
Gita Alaghband
1994-01-01
Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Automatic Loop Parallelization via Compiler Guided Refactoring
Larsen, Per; Ladelsky, Razya; Lidman, Jacob
For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...
Parallel kinematics type, kinematics, and optimal design
Liu, Xin-Jun
2014-01-01
Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others. This book is intended for researchers, scientists, engineers and postgraduates or above with interes...
Applied Parallel Computing Industrial Computation and Optimization
Madsen, Kaj; NA NA NA Olesen, Dorte
Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...
GPGPU Parallel SPIN Model Checker
National Aeronautics and Space Administration — Model Checking is a powerful technique used to verify that a system does not violate its intended behavior. While this is very useful in proving the robustness of a...
Parallel algorithms and cluster computing
Hoffmann, Karl Heinz
2007-01-01
This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
Parallel computation of rotating flows
Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær
1999-01-01
This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
A Parallel Approach to Fractal Image Compression
Lubomir Dedera
2004-01-01
Full Text Available The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.
Parallel Computing Using Web Servers and "Servlets".
Lo, Alfred; Bloor, Chris; Choi, Y. K.
2000-01-01
Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
An Introduction to Parallel Computation R
How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.
Comparison of parallel viscosity with neoclassical theory
Ida, K.; Nakajima, N.
1996-04-01
Toroidal rotation profiles are measured with charge exchange spectroscopy for the plasma heated with tangential NBI in CHS heliotron/torsatron device to estimate parallel viscosity. The parallel viscosity derived from the toroidal rotation velocity shows good agreement with the neoclassical parallel viscosity plus the perpendicular viscosity. (μ perpendicular = 2 m 2 /s). (author)
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Improving parallel imaging by jointly reconstructing multi-contrast data.
Bilgic, Berkin; Kim, Tae Hyung; Liao, Congyu; Manhard, Mary Kate; Wald, Lawrence L; Haldar, Justin P; Setsompop, Kawin
2018-08-01
To develop parallel imaging techniques that simultaneously exploit coil sensitivity encoding, image phase prior information, similarities across multiple images, and complementary k-space sampling for highly accelerated data acquisition. We introduce joint virtual coil (JVC)-generalized autocalibrating partially parallel acquisitions (GRAPPA) to jointly reconstruct data acquired with different contrast preparations, and show its application in 2D, 3D, and simultaneous multi-slice (SMS) acquisitions. We extend the joint parallel imaging concept to exploit limited support and smooth phase constraints through Joint (J-) LORAKS formulation. J-LORAKS allows joint parallel imaging from limited autocalibration signal region, as well as permitting partial Fourier sampling and calibrationless reconstruction. We demonstrate highly accelerated 2D balanced steady-state free precession with phase cycling, SMS multi-echo spin echo, 3D multi-echo magnetization-prepared rapid gradient echo, and multi-echo gradient recalled echo acquisitions in vivo. Compared to conventional GRAPPA, proposed joint acquisition/reconstruction techniques provide more than 2-fold reduction in reconstruction error. JVC-GRAPPA takes advantage of additional spatial encoding from phase information and image similarity, and employs different sampling patterns across acquisitions. J-LORAKS achieves a more parsimonious low-rank representation of local k-space by considering multiple images as additional coils. Both approaches provide dramatic improvement in artifact and noise mitigation over conventional single-contrast parallel imaging reconstruction. Magn Reson Med 80:619-632, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2018 International Society for Magnetic Resonance in Medicine.
Analysis of single blow effectiveness in non-uniform parallel plate regenerators
Jensen, Jesper Buch; Bahl, Christian Robert Haffenden; Engelbrecht, Kurt
2011-01-01
Non-uniform distributions of plate spacings in parallel plate regenerators have been found to induce loss of performance. In this paper, it has been investigated how variations of three geometric parameters (the aspect ratio, the porosity, and the standard deviation of the plate spacing) affects...
Parallelization of quantum molecular dynamics simulation code
Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu
1998-02-01
A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
Implementation and performance of parallelized elegant
Wang, Y.; Borland, M.
2008-01-01
The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
Research on parallel algorithm for sequential pattern mining
Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao
2008-03-01
Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.
Parallelization of 2-D lattice Boltzmann codes
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.
2017-01-01
The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed
Experiences in Data-Parallel Programming
Terry W. Clark
1997-01-01
Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
Massively parallel diffuse optical tomography
Sandusky, John V.; Pitts, Todd A.
2017-09-05
Diffuse optical tomography systems and methods are described herein. In a general embodiment, the diffuse optical tomography system comprises a plurality of sensor heads, the plurality of sensor heads comprising respective optical emitter systems and respective sensor systems. A sensor head in the plurality of sensors heads is caused to act as an illuminator, such that its optical emitter system transmits a transillumination beam towards a portion of a sample. Other sensor heads in the plurality of sensor heads act as observers, detecting portions of the transillumination beam that radiate from the sample in the fields of view of the respective sensory systems of the other sensor heads. Thus, sensor heads in the plurality of sensors heads generate sensor data in parallel.
Embodied and Distributed Parallel DJing.
Cappelen, Birgitta; Andersson, Anders-Petter
2016-01-01
Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things.
Device for balancing parallel strings
Mashikian, Matthew S.
1985-01-01
A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.
Linear parallel processing machines I
Von Kunze, M
1984-01-01
As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.
Parallel computing in enterprise modeling.
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Parallel-Sequential Texture Analysis
van den Broek, Egon; Singh, Sameer; Singh, Maneesha; van Rikxoort, Eva M.; Apte, Chid; Perner, Petra
2005-01-01
Color induced texture analysis is explored, using two texture analysis techniques: the co-occurrence matrix and the color correlogram as well as color histograms. Several quantization schemes for six color spaces and the human-based 11 color quantization scheme have been applied. The VisTex texture
Codimension two Kaehler submanifolds of space forms
Ferreira, M.J.; Tribuzy, R.
2001-03-01
In this article we study isometric immersions from Kaehler manifolds whose (1,1) part of the second fundamental form is parallel, the ppmc isometric immersions. When the domain is a Riemann surface these immersions are precisely those with parallel mean curvature. P. J. Ryan has classified the Kaehler manifolds that admit isometric immersions, as real hypersurfaces, in space forms. We classify the codimension two ppmc isometric immersions into space forms. (author)
Massively Parallel Dimension Independent Adaptive Metropolis
Chen, Yuxin
2015-05-14
This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known and widely respected adaptive Metropolis (AM) algorithm is extended herein to asymptotically scale uniformly with respect to the underlying parameter dimension, by respecting the variance, for Gaussian targets. The result- ing algorithm, referred to as the dimension-independent adaptive Metropolis (DIAM) algorithm, also shows improved performance with respect to adaptive Metropolis on non-Gaussian targets. This algorithm is further improved, and the possibility of probing high-dimensional targets is enabled, via GPU-accelerated numerical libraries and periodically synchronized concurrent chains (justified a posteriori). Asymptoti- cally in dimension, this massively parallel dimension-independent adaptive Metropolis (MPDIAM) GPU implementation exhibits a factor of four improvement versus the CPU-based Intel MKL version alone, which is itself already a factor of three improve- ment versus the serial version. The scaling to multiple CPUs and GPUs exhibits a form of strong scaling in terms of the time necessary to reach a certain convergence criterion, through a combination of longer time per sample batch (weak scaling) and yet fewer necessary samples to convergence. This is illustrated by e ciently sampling from several Gaussian and non-Gaussian targets for dimension d 1000.
Multipactor saturation in parallel-plate waveguides
Sorolla, E.; Mattes, M.
2012-01-01
The saturation stage of a multipactor discharge is considered of interest, since it can guide towards a criterion to assess the multipactor onset. The electron cloud under multipactor regime within a parallel-plate waveguide is modeled by a thin continuous distribution of charge and the equations of motion are calculated taking into account the space charge effects. The saturation is identified by the interaction of the electron cloud with its image charge. The stability of the electron population growth is analyzed and two mechanisms of saturation to explain the steady-state multipactor for voltages near above the threshold onset are identified. The impact energy in the collision against the metal plates decreases during the electron population growth due to the attraction of the electron sheet on the image through the initial plate. When this growth remains stable till the impact energy reaches the first cross-over point, the electron surface density tends to a constant value. When the stability is broken before reaching the first cross-over point the surface charge density oscillates chaotically bounded within a certain range. In this case, an expression to calculate the maximum electron surface charge density is found whose predictions agree with the simulations when the voltage is not too high.
A parallel algorithm for 3D dislocation dynamics
Wang Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard
2006-01-01
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals
A Programming Model for Massive Data Parallelism with Data Dependencies
Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng
2009-01-01
Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains
Enhancing sedimentation by improving flow conditions using parallel retrofit baffles.
He, Cheng; Scott, Eric; Rochfort, Quintin
2015-09-01
In this study, placing parallel-connected baffles in the vicinity of the inlet was proposed to improve hydraulic conditions for enhancing TSS (total suspended solids) removal. The purpose of the retrofit baffle design is to divide the large and fast inflow into smaller and slower flows to increase flow uniformity. This avoids short-circuiting and increases residence time in the sedimentation basin. The newly proposed parallel-connected baffle configuration was assessed in the laboratory by comparing its TSS removal performance and the optimal flow residence time with those from the widely used series-connected baffles. The experimental results showed that the parallel-connected baffles outperformed the series-connected baffles because it could disperse flow faster and in less space by splitting the large inflow into many small branches instead of solely depending on flow internal friction over a longer flow path, as was the case under the series-connected baffles. Being able to dampen faster flow before entering the sedimentation basin is critical to reducing the possibility of disturbing any settled particles, especially under high inflow conditions. Also, for a large sedimentation basin, it may be more economically feasible to deploy the proposed parallel retrofit baffle in the vicinity of the inlet than series-connected baffles throughout the entire settling basin. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Acceleration and parallelization calculation of EFEN-SP_3 method
Yang Wen; Zheng Youqi; Wu Hongchun; Cao Liangzhi; Li Yunzhao
2013-01-01
Due to the fact that the exponential function expansion nodal-SP_3 (EFEN-SP_3) method needs further improvement in computational efficiency to routinely carry out PWR whole core pin-by-pin calculation, the coarse mesh acceleration and spatial parallelization were investigated in this paper. The coarse mesh acceleration was built by considering discontinuity factor on each coarse mesh interface and preserving neutron balance within each coarse mesh in space, angle and energy. The spatial parallelization based on MPI was implemented by guaranteeing load balancing and minimizing communications cost to fully take advantage of the modern computing and storage abilities. Numerical results based on a commercial nuclear power reactor demonstrate an speedup ratio of about 40 for the coarse mesh acceleration and a parallel efficiency of higher than 60% with 40 CPUs for the spatial parallelization. With these two improvements, the EFEN code can complete a PWR whole core pin-by-pin calculation with 289 × 289 × 218 meshes and 4 energy groups within 100 s by using 48 CPUs (2.40 GHz frequency). (authors)
A new parallelization algorithm of ocean model with explicit scheme
Fu, X. D.
2017-08-01
This paper will focus on the parallelization of ocean model with explicit scheme which is one of the most commonly used schemes in the discretization of governing equation of ocean model. The characteristic of explicit schema is that calculation is simple, and that the value of the given grid point of ocean model depends on the grid point at the previous time step, which means that one doesn’t need to solve sparse linear equations in the process of solving the governing equation of the ocean model. Aiming at characteristics of the explicit scheme, this paper designs a parallel algorithm named halo cells update with tiny modification of original ocean model and little change of space step and time step of the original ocean model, which can parallelize ocean model by designing transmission module between sub-domains. This paper takes the GRGO for an example to implement the parallelization of GRGO (Global Reduced Gravity Ocean model) with halo update. The result demonstrates that the higher speedup can be achieved at different problem size.
Compiler Technology for Parallel Scientific Computation
Can Özturan
1994-01-01
Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Computer-Aided Parallelizer and Optimizer
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.
Dematté, Lorenzo
2012-01-01
Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
PUMA: An Operating System for Massively Parallel Systems
Stephen R. Wheat
1994-01-01
Full Text Available This article presents an overview of PUMA (Performance-oriented, User-managed Messaging Architecture, a message-passing kernel for massively parallel systems. Message passing in PUMA is based on portals – an opening in the address space of an application process. Once an application process has established a portal, other processes can write values into the portal using a simple send operation. Because messages are written directly into the address space of the receiving process, there is no need to buffer messages in the PUMA kernel and later copy them into the applications address space. PUMA consists of two components: the quintessential kernel (Q-Kernel and the process control thread (PCT. Although the PCT provides management decisions, the Q-Kernel controls access and implements the policies specified by the PCT.
Parallel processing for fluid dynamics applications
Johnson, G.M.
1989-01-01
The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Design considerations for parallel graphics libraries
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Synchronization Techniques in Parallel Discrete Event Simulation
Lindén, Jonatan
2018-01-01
Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
Parallel processing from applications to systems
Moldovan, Dan I
1993-01-01
This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla
Parallel processing for artificial intelligence 1
Kanal, LN; Kumar, V; Suttner, CB
1994-01-01
Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus
A survey of parallel multigrid algorithms
Chan, Tony F.; Tuminaro, Ray S.
1987-01-01
A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Refinement of Parallel and Reactive Programs
Back, R. J. R.
1992-01-01
We show how to apply the refinement calculus to stepwise refinement of parallel and reactive programs. We use action systems as our basic program model. Action systems are sequential programs which can be implemented in a parallel fashion. Hence refinement calculus methods, originally developed for sequential programs, carry over to the derivation of parallel programs. Refinement of reactive programs is handled by data refinement techniques originally developed for the sequential refinement c...
The suppression of destructive sparks in parallel plate proportional counters
Cockshott, R.A.; Mason, I.M.
1984-02-01
The authors find that high energy background events produce localised sparks in parallel plate counters when operated in the proportional mode. These sparks increase dead-time and lead to degradation ranging from electrode damage to spurious pulsing and continuous breakdown. The problem is particularly serious in low energy photon detectors for X-ray astronomy which are required to have lifetimes of several years in the high radiation environment of space. For the parallel plate imaging detector developed for the European X-ray Observatory Satellite (EXOSAT) they investigate quantitatively the spark thresholds, spark rates and degradation processes. They discuss the spark mechanism, pointing out differences from the situation in spark chambers and counters. They show that the time profile of the sparks allows them to devise a spark suppression system which reduces the degradation rate by a factor of ''200.
Decentralized Interleaving of Paralleled Dc-Dc Buck Converters: Preprint
Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Rodriguez, Miguel [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Sinha, Mohit [University of Minnesota; Dhople, Sairaj [University of Minnesota; Poon, Jason [University of California at Berkeley
2017-09-01
We present a decentralized control strategy that yields switch interleaving among parallel connected dc-dc buck converters without communication. The proposed method is based on the digital implementation of the dynamics of a nonlinear oscillator circuit as the controller. Each controller is fully decentralized, i.e., it only requires the locally measured output current to synthesize the pulse width modulation (PWM) carrier waveform. By virtue of the intrinsic electrical coupling between converters, the nonlinear oscillator-based controllers converge to an interleaved state with uniform phase-spacing across PWM carriers. To the knowledge of the authors, this work represents the first fully decentralized strategy for switch interleaving of paralleled dc-dc buck converters.
Parallel Geometries in Geant4 foundation and recent enhancements
Apostolakis, J; Cosmo, G; Howard, A; Ivanchenko, V; Verderi, M
2009-01-01
The Geant4 software toolkit simulates the passage of particles through matter. It is utilized in high energy and nuclear physics experiments, in medical physics and space applications. For many applications it is necessary to measure particle fluxes and radiation doses in parts of the setup where there are complex structures. To undertake this in a flexible way, Geant4 has tools to create and use additional, parallel, geometrical hierarchies within a single application. A separate, parallel geometry can be used for each one amongst shower parameterization, event biasing, scoring of radiation, and/or the creation of hits in detailed readout structures. We describe the existing basic capabilities of the Geant4 toolkit to create multiple geometries and the recent major enhancements undertaken to streamline, enhance and extend these. New functionality enables Geant4 developers to offer new embedded schemes for scoring (requiring no user C++ code); has simplified the implementation of processes or capabilities usi...
Particle simulation on a distributed memory highly parallel processor
Sato, Hiroyuki; Ikesaka, Morio
1990-01-01
This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)
Parallel Prediction of Stock Volatility
Priscilla Jenq
2017-10-01
Full Text Available Volatility is a measurement of the risk of financial products. A stock will hit new highs and lows over time and if these highs and lows fluctuate wildly, then it is considered a high volatile stock. Such a stock is considered riskier than a stock whose volatility is low. Although highly volatile stocks are riskier, the returns that they generate for investors can be quite high. Of course, with a riskier stock also comes the chance of losing money and yielding negative returns. In this project, we will use historic stock data to help us forecast volatility. Since the financial industry usually uses S&P 500 as the indicator of the market, we will use S&P 500 as a benchmark to compute the risk. We will also use artificial neural networks as a tool to predict volatilities for a specific time frame that will be set when we configure this neural network. There have been reports that neural networks with different numbers of layers and different numbers of hidden nodes may generate varying results. In fact, we may be able to find the best configuration of a neural network to compute volatilities. We will implement this system using the parallel approach. The system can be used as a tool for investors to allocating and hedging assets.
Vectoring of parallel synthetic jets
Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume
2015-11-01
A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).
A Soft Parallel Kinematic Mechanism.
White, Edward L; Case, Jennifer C; Kramer-Bottiglio, Rebecca
2018-02-01
In this article, we describe a novel holonomic soft robotic structure based on a parallel kinematic mechanism. The design is based on the Stewart platform, which uses six sensors and actuators to achieve full six-degree-of-freedom motion. Our design is much less complex than a traditional platform, since it replaces the 12 spherical and universal joints found in a traditional Stewart platform with a single highly deformable elastomer body and flexible actuators. This reduces the total number of parts in the system and simplifies the assembly process. Actuation is achieved through coiled-shape memory alloy actuators. State observation and feedback is accomplished through the use of capacitive elastomer strain gauges. The main structural element is an elastomer joint that provides antagonistic force. We report the response of the actuators and sensors individually, then report the response of the complete assembly. We show that the completed robotic system is able to achieve full position control, and we discuss the limitations associated with using responsive material actuators. We believe that control demonstrated on a single body in this work could be extended to chains of such bodies to create complex soft robots.
Productive Parallel Programming: The PCN Approach
Ian Foster
1992-01-01
Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
Prabhat
2014-01-01
Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Parallel, Rapid Diffuse Optical Tomography of Breast
Yodh, Arjun
2001-01-01
During the last year we have experimentally and computationally investigated rapid acquisition and analysis of informationally dense diffuse optical data sets in the parallel plate compressed breast geometry...
Parallel, Rapid Diffuse Optical Tomography of Breast
Yodh, Arjun
2002-01-01
During the last year we have experimentally and computationally investigated rapid acquisition and analysis of informationally dense diffuse optical data sets in the parallel plate compressed breast geometry...
Parallel auto-correlative statistics with VTK.
Pebay, Philippe Pierre; Bennett, Janine Camille
2013-08-01
This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Conformal pure radiation with parallel rays
Leistner, Thomas; Paweł Nurowski
2012-01-01
We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves. (paper)
Compiling Scientific Programs for Scalable Parallel Systems
Kennedy, Ken
2001-01-01
...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...
Parallel thermal radiation transport in two dimensions
Smedley-Stevenson, R.P.; Ball, S.R.
2003-01-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel Algorithms for the Exascale Era
Robey, Robert W. [Los Alamos National Laboratory
2016-10-19
New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
Parallel thermal radiation transport in two dimensions
Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)
2003-07-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Structured Parallel Programming Patterns for Efficient Computation
McCool, Michael; Robison, Arch
2012-01-01
Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
The Perspective Structure of Visual Space
2015-01-01
Luneburg’s model has been the reference for experimental studies of visual space for almost seventy years. His claim for a curved visual space has been a source of inspiration for visual scientists as well as philosophers. The conclusion of many experimental studies has been that Luneburg’s model does not describe visual space in various tasks and conditions. Remarkably, no alternative model has been suggested. The current study explores perspective transformations of Euclidean space as a model for visual space. Computations show that the geometry of perspective spaces is considerably different from that of Euclidean space. Collinearity but not parallelism is preserved in perspective space and angles are not invariant under translation and rotation. Similar relationships have shown to be properties of visual space. Alley experiments performed early in the nineteenth century have been instrumental in hypothesizing curved visual spaces. Alleys were computed in perspective space and compared with reconstructed alleys of Blumenfeld. Parallel alleys were accurately described by perspective geometry. Accurate distance alleys were derived from parallel alleys by adjusting the interstimulus distances according to the size-distance invariance hypothesis. Agreement between computed and experimental alleys and accommodation of experimental results that rejected Luneburg’s model show that perspective space is an appropriate model for how we perceive orientations and angles. The model is also appropriate for perceived distance ratios between stimuli but fails to predict perceived distances. PMID:27648222
Parallel Construction of Irreducible Polynomials
Frandsen, Gudmund Skovbjerg
Let arithmetic pseudo-NC^k denote the problems that can be solved by log space uniform arithmetic circuits over the finite prime field GF(p) of depth O(log^k (n + p)) and size polynomial in (n + p). We show that the problem of constructing an irreducible polynomial of specified degree over GF(p) ...... of polynomials is in arithmetic NC^3. Our algorithm works over any field and compared to other known algorithms it does not assume the ability to take p'th roots when the field has characteristic p....
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A
2017-01-01
The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
von Davier, Matthias
2016-01-01
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
The language parallel Pascal and other aspects of the massively parallel processor
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations
Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui
2017-01-01
Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Parallel Boltzmann machines : a mathematical model
Zwietering, P.J.; Aarts, E.H.L.
1991-01-01
A mathematical model is presented for the description of parallel Boltzmann machines. The framework is based on the theory of Markov chains and combines a number of previously known results into one generic model. It is argued that parallel Boltzmann machines maximize a function consisting of a
The convergence of parallel Boltzmann machines
Zwietering, P.J.; Aarts, E.H.L.; Eckmiller, R.; Hartmann, G.; Hauske, G.
1990-01-01
We discuss the main results obtained in a study of a mathematical model of synchronously parallel Boltzmann machines. We present supporting evidence for the conjecture that a synchronously parallel Boltzmann machine maximizes a consensus function that consists of a weighted sum of the regular
Customizable Memory Schemes for Data Parallel Architectures
Gou, C.
2011-01-01
Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses
Parallel Narrative Structure in Paul Harding's "Tinkers"
Çirakli, Mustafa Zeki
2014-01-01
The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…
Bayer image parallel decoding based on GPU
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Parallelization of TMVA Machine Learning Algorithms
Hajili, Mammad
2017-01-01
This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.
17 CFR 12.24 - Parallel proceedings.
2010-04-01
...) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration proceeding... the receivership includes the resolution of claims made by customers; or (3) A petition filed under... any of the foregoing with knowledge of a parallel proceeding shall promptly notify the Commission, by...
Parallel S/sub n/ iteration schemes
Wienke, B.R.; Hiromoto, R.E.
1986-01-01
The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial
Parallel Computing Strategies for Irregular Algorithms
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallel fuzzy connected image segmentation on GPU
Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.
2011-01-01
Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...
Parallel Algorithms for Groebner-Basis Reduction
1987-09-25
22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report
Parallel knock-out schemes in networks
Broersma, H.J.; Fomin, F.V.; Woeginger, G.J.
2004-01-01
We consider parallel knock-out schemes, a procedure on graphs introduced by Lampert and Slater in 1997 in which each vertex eliminates exactly one of its neighbors in each round. We are considering cases in which after a finite number of rounds, where the minimimum number is called the parallel
Parallel Architectures for Planetary Exploration Requirements (PAPER)
Cezzar, Ruknet
1993-01-01
The project's main contributions have been in the area of student support. Throughout the project, at least one, in some cases two, undergraduate students have been supported. By working with the project, these students gained valuable knowledge involving the scientific research project, including the not-so-pleasant reporting requirements to the funding agencies. The other important contribution was towards the establishment of a graduate program in computer science at Hampton University. Primarily, the PAPER project has served as the main research basis in seeking funds from other agencies, such as the National Science Foundation, for establishing a research infrastructure in the department. In technical areas, especially in the first phase, we believe the trip to Jet Propulsion Laboratory, and gathering together all the pertinent information involving experimental computer architectures aimed for planetary explorations was very helpful. Indeed, if this effort is to be revived in the future due to congressional funding for planetary explorations, say an unmanned mission to Mars, our interim report will be an important starting point. In other technical areas, our simulator has pinpointed and highlighted several important performance issues related to the design of operating system kernels for MIMD machines. In particular, the critical issue of how the kernel itself will run in parallel on a multiple-processor system has been addressed through the various ready list organization and access policies. In the area of neural computing, our main contribution was an introductory tutorial package to familiarize the researchers at NASA with this new and promising field zone axes (20). Finally, we have introduced the notion of reversibility in programming systems which may find applications in various areas of space research.
Broadcasting a message in a parallel computer
Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Advanced parallel processing with supercomputer architectures
Hwang, K.
1987-01-01
This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
Differences Between Distributed and Parallel Systems
Brightwell, R.; Maccabe, A.B.; Rissen, R.
1998-10-01
Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Parallel programming with Easy Java Simulations
Esquembre, F.; Christian, W.; Belloni, M.
2018-01-01
Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
A parallel coordinates style interface for exploratory volume visualization.
Tory, Melanie; Potts, Simeon; Möller, Torsten
2005-01-01
We present a user interface, based on parallel coordinates, that facilitates exploration of volume data. By explicitly representing the visualization parameter space, the interface provides an overview of rendering options and enables users to easily explore different parameters. Rendered images are stored in an integrated history bar that facilitates backtracking to previous visualization options. Initial usability testing showed clear agreement between users and experts of various backgrounds (usability, graphic design, volume visualization, and medical physics) that the proposed user interface is a valuable data exploration tool.
Sn transport calculations on vector and parallel processors
Rhoades, W.A.; Childs, R.L.
1987-01-01
The transport of radiation from the source to the location of people or equipment gives rise to some of the most challenging of calculations. A problem may involve as many as a billion unknowns, each evaluated several times to resolve interdependence. Such calculations run many hours on a Cray computer, and a typical study involves many such calculations. This paper will discuss the steps taken to vectorize the DOT code, which solves transport problems in two space dimensions (2-D); the extension of this code to 3-D; and the plans for extension to parallel processors
Weighted semiconvex spaces of measurable functions
Olaleru, J.O.
2001-12-01
Semiconvex spaces are intermediates between locally convex spaces and the non locally convex topological vector spaces. They include all locally convex spaces; hence it is a generalization of locally convex spaces. In this article, we make a study of weighted semiconvex spaces parallel to weighted locally convex spaces where continuous functions are replaced with measurable functions and N p family replaces Nachbin family on a locally compact space X. Among others, we examine the Hausdorffness, completeness, inductive limits, barrelledness and countably barrelledness of weighted semiconvex spaces. New results are obtained while we have a more elegant proofs of old results. Furthermore, we get extensions of some of the old results. It is observed that the technique of proving theorems in weighted locally convex spaces can be adapted to that of weighted semicovex spaces of measurable functions in most cases. (author)
Arkin, Ethem; Tekinerdogan, Bedir
2016-01-01
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the
Harmonic resonance assessment of multiple paralleled grid-connected inverters system
Wang, Yanbo; Wang, Xiongfei; Blaabjerg, Frede
2017-01-01
This paper presents an eigenvalue-based impedance stability analytical method of multiple paralleled grid-connected inverter system. Different from the conventional impedance-based stability criterion, this work first built the state-space model of paralleled grid-connected inverters. On the basis...... of this, a bridge between the state-space-based modelling and impedance-based stability criterion is presented. The proposed method is able to perform stability assessment locally at the connection points of the component. Meanwhile, the eigenvalue-based sensitivity analysis is adopted to identify...
Graber, H [Commissariat a l' Energie Atomique, 91 - Saclay (France). Centre d' Etudes Nucleaires
1969-04-01
By introducing an additional parameter F{sub 0}, the processes known hitherto for calculating heat transfer are extended to the heat flux distributions following an exponential law q{sub w} = exp(mx) which give a heat transfer coefficient, independent of position for laminar and turbulent flow with a linear pressure drop. For laminar flow along a semi-infinite plate, the heat flux distribution in accordance with the law qw = x{sup m} leads to the Nusselt number, regardless of the position. Nu is then determined by the thickness of the thermal boundary layer. For the annular space, the equations for explicit calculation of the temperature field will be given, as well as the Nusselt number in laminar flow and constant heat flux. In turbulent flow, the laws of distribution of eddy diffusivity for momentum in a tube, established by H. Reichardt, adapted for the annular space and the tube bundle, give the velocity field and the coefficient of friction and thus permit solution of the heat transfer equations. The results of the numerical calculation are given in the tables and diagrams for an extended range of the various parameters and compared with the experimental results. A simple process to determine the lower limit of the thermal entry length will be described. (author) [French] Par l'introduction d'un parametre supplementaire F{sub 0}, les procedes connus jusqu'a present pour le calcul du transfert de chaleur sont etendus aux repartitions exponentielles q{sub w} = exp(mx) du flux de chaleur qui indiquent un coefficient de transfert de chaleur independant de l'endroit pour l'ecoulement laminaire ou turbulent avec chute de pression lineaire. Pour l'ecoulement laminaire le long d'une plaque plane, la repartition du flux de chaleur selon la loi q{sub w} = x{sup m} conduit au nombre de Nusselt independant de l'endroit. Nu est alors determine par l'epaisseur de la couche limite thermique. Pour l'espace annulaire, seront indiquees les equations pour le calcul explicite du
Graber, H. [Commissariat a l' Energie Atomique, 91 - Saclay (France). Centre d' Etudes Nucleaires
1969-04-01
By introducing an additional parameter F{sub 0}, the processes known hitherto for calculating heat transfer are extended to the heat flux distributions following an exponential law q{sub w} = exp(mx) which give a heat transfer coefficient, independent of position for laminar and turbulent flow with a linear pressure drop. For laminar flow along a semi-infinite plate, the heat flux distribution in accordance with the law qw = x{sup m} leads to the Nusselt number, regardless of the position. Nu is then determined by the thickness of the thermal boundary layer. For the annular space, the equations for explicit calculation of the temperature field will be given, as well as the Nusselt number in laminar flow and constant heat flux. In turbulent flow, the laws of distribution of eddy diffusivity for momentum in a tube, established by H. Reichardt, adapted for the annular space and the tube bundle, give the velocity field and the coefficient of friction and thus permit solution of the heat transfer equations. The results of the numerical calculation are given in the tables and diagrams for an extended range of the various parameters and compared with the experimental results. A simple process to determine the lower limit of the thermal entry length will be described. (author) [French] Par l'introduction d'un parametre supplementaire F{sub 0}, les procedes connus jusqu'a present pour le calcul du transfert de chaleur sont etendus aux repartitions exponentielles q{sub w} = exp(mx) du flux de chaleur qui indiquent un coefficient de transfert de chaleur independant de l'endroit pour l'ecoulement laminaire ou turbulent avec chute de pression lineaire. Pour l'ecoulement laminaire le long d'une plaque plane, la repartition du flux de chaleur selon la loi q{sub w} = x{sup m} conduit au nombre de Nusselt independant de l'endroit. Nu est alors determine par l'epaisseur de la couche limite thermique. Pour l'espace annulaire, seront
Karen Kershaw
an enquiry into women's participation ... Sharma and Arun Kumar. UNNATI Organisation for Development Education. India. Parallel Sessions II - Session B ... Poor social, economic .... based inequalities ... Feminine assertion of power leads to.
HVI Ballistic Performance Characterization of Non-Parallel Walls
Bohl, William; Miller, Joshua; Christiansen, Eric
2012-01-01
The Double-Wall, "Whipple" Shield [1] has been the subject of many hypervelocity impact studies and has proven to be an effective shield system for Micro-Meteoroid and Orbital Debris (MMOD) impacts for spacecraft. The US modules of the International Space Station (ISS), with their "bumper shields" offset from their pressure holding rear walls provide good examples of effective on-orbit use of the double wall shield. The concentric cylinder shield configuration with its large radius of curvature relative to separation distance is easily and effectively represented for testing and analysis as a system of two parallel plates. The parallel plate double wall configuration has been heavily tested and characterized for shield performance for normal and oblique impacts for the ISS and other programs. The double wall shield and principally similar Stuffed Whipple Shield are very common shield types for MMOD protection. However, in some locations with many spacecraft designs, the rear wall cannot be modeled as being parallel or concentric with the outer bumper wall. As represented in Figure 1, there is an included angle between the two walls. And, with a cylindrical outer wall, the effective included angle constantly changes. This complicates assessment of critical spacecraft components located within outer spacecraft walls when using software tools such as NASA's BumperII. In addition, the validity of the risk assessment comes into question when using the standard double wall shield equations, especially since verification testing of every set of double wall included angles is impossible.
Fast electrostatic force calculation on parallel computer clusters
Kia, Amirali; Kim, Daejoong; Darve, Eric
2008-01-01
The fast multipole method (FMM) and smooth particle mesh Ewald (SPME) are well known fast algorithms to evaluate long range electrostatic interactions in molecular dynamics and other fields. FMM is a multi-scale method which reduces the computation cost by approximating the potential due to a group of particles at a large distance using few multipole functions. This algorithm scales like O(N) for N particles. SPME algorithm is an O(NlnN) method which is based on an interpolation of the Fourier space part of the Ewald sum and evaluating the resulting convolutions using fast Fourier transform (FFT). Those algorithms suffer from relatively poor efficiency on large parallel machines especially for mid-size problems around hundreds of thousands of atoms. A variation of the FMM, called PWA, based on plane wave expansions is presented in this paper. A new parallelization strategy for PWA, which takes advantage of the specific form of this expansion, is described. Its parallel efficiency is compared with SPME through detail time measurements on two different computer clusters
Parallel clustering algorithm for large-scale biological data sets.
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
QR-decomposition based SENSE reconstruction using parallel architecture.
Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad
2018-04-01
Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.
Portable parallel programming in a Fortran environment
May, E.N.
1989-01-01
Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Performance of the Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Automated Parallel Capillary Electrophoretic System
Li, Qingbo; Kane, Thomas E.; Liu, Changsheng; Sonnenschein, Bernard; Sharer, Michael V.; Kernan, John R.
2000-02-22
An automated electrophoretic system is disclosed. The system employs a capillary cartridge having a plurality of capillary tubes. The cartridge has a first array of capillary ends projecting from one side of a plate. The first array of capillary ends are spaced apart in substantially the same manner as the wells of a microtitre tray of standard size. This allows one to simultaneously perform capillary electrophoresis on samples present in each of the wells of the tray. The system includes a stacked, dual carousel arrangement to eliminate cross-contamination resulting from reuse of the same buffer tray on consecutive executions from electrophoresis. The system also has a gel delivery module containing a gel syringe/a stepper motor or a high pressure chamber with a pump to quickly and uniformly deliver gel through the capillary tubes. The system further includes a multi-wavelength beam generator to generate a laser beam which produces a beam with a wide range of wavelengths. An off-line capillary reconditioner thoroughly cleans a capillary cartridge to enable simultaneous execution of electrophoresis with another capillary cartridge. The streamlined nature of the off-line capillary reconditioner offers the advantage of increased system throughput with a minimal increase in system cost.
A proposed experimental search for chameleons using asymmetric parallel plates
Burrage, Clare; Copeland, Edmund J.; Stevenson, James A.
2016-01-01
Light scalar fields coupled to matter are a common consequence of theories of dark energy and attempts to solve the cosmological constant problem. The chameleon screening mechanism is commonly invoked in order to suppress the fifth forces mediated by these scalars, sufficiently to avoid current experimental constraints, without fine tuning. The force is suppressed dynamically by allowing the mass of the scalar to vary with the local density. Recently it has been shown that near future cold atoms experiments using atom-interferometry have the ability to access a large proportion of the chameleon parameter space. In this work we demonstrate how experiments utilising asymmetric parallel plates can push deeper into the remaining parameter space available to the chameleon.
Vectoring of parallel synthetic jets: A parametric study
Berk, Tim; Gomit, Guillaume; Ganapathisubramani, Bharathram
2016-11-01
The vectoring of a pair of parallel synthetic jets can be described using five dimensionless parameters: the aspect ratio of the slots, the Strouhal number, the Reynolds number, the phase difference between the jets and the spacing between the slots. In the present study, the influence of the latter four on the vectoring behaviour of the jets is examined experimentally using particle image velocimetry. Time-averaged velocity maps are used to study the variations in vectoring behaviour for a parametric sweep of each of the four parameters independently. A topological map is constructed for the full four-dimensional parameter space. The vectoring behaviour is described both qualitatively and quantitatively. A vectoring mechanism is proposed, based on measured vortex positions. We acknowledge the financial support from the European Research Council (ERC Grant Agreement No. 277472).
[PVFS 2000: An operational parallel file system for Beowulf
Ligon, Walt
2004-01-01
The approach has been to develop Parallel Virtual File System version 2 (PVFS2) , retaining the basic philosophy of the original file system but completely rewriting the code. It shows the architecture of the server and client components. BMI - BMI is the network abstraction layer. It is designed with a common driver and modules for each protocol supported. The interface is non-blocking, and provides mechanisms for optimizations including pinning user buffers. Currently TCP/IP and GM(Myrinet) modules have been implemented. Trove -Trove is the storage abstraction layer. It provides for storing both data spaces and name/value pairs. Trove can also be implemented using different underlying storage mechanisms including native files, raw disk partitions, SQL and other databases. The current implementation uses native files for data spaces and Berkeley db for name/value pairs.
A proposed experimental search for chameleons using asymmetric parallel plates
Burrage, Clare; Copeland, Edmund J.; Stevenson, James A., E-mail: Clare.Burrage@nottingham.ac.uk, E-mail: ed.copeland@nottingham.ac.uk, E-mail: james.stevenson@nottingham.ac.uk [School of Physics and Astronomy, University of Nottingham, Nottingham, NG7 2RD (United Kingdom)
2016-08-01
Light scalar fields coupled to matter are a common consequence of theories of dark energy and attempts to solve the cosmological constant problem. The chameleon screening mechanism is commonly invoked in order to suppress the fifth forces mediated by these scalars, sufficiently to avoid current experimental constraints, without fine tuning. The force is suppressed dynamically by allowing the mass of the scalar to vary with the local density. Recently it has been shown that near future cold atoms experiments using atom-interferometry have the ability to access a large proportion of the chameleon parameter space. In this work we demonstrate how experiments utilising asymmetric parallel plates can push deeper into the remaining parameter space available to the chameleon.
Non-Almost Periodicity of Parallel Transports for Homogeneous Connections
Brunnemann, Johannes; Fleischhack, Christian
2012-01-01
Let A be the affine space of all connections in an SU(2) principal fibre bundle over ℝ 3 . The set of homogeneous isotropic connections forms a line l in A. We prove that the parallel transports for general, non-straight paths in the base manifold do not depend almost periodically on l. Consequently, the embedding l ↪ A does not continuously extend to an embedding l-bar ↪ A-bar of the respective compactifications. Here, the Bohr compactification l-bar corresponds to the configuration space of homogeneous isotropic loop quantum cosmology and A-bar to that of loop quantum gravity. Analogous results are given for the anisotropic case.
The kpx, a program analyzer for parallelization
Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.
1997-03-01
The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Synchronization Of Parallel Discrete Event Simulations
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Multistage parallel-serial time averaging filters
Theodosiou, G.E.
1980-01-01
Here, a new time averaging circuit design, the 'parallel filter' is presented, which can reduce the time jitter, introduced in time measurements using counters of large dimensions. This parallel filter could be considered as a single stage unit circuit which can be repeated an arbitrary number of times in series, thus providing a parallel-serial filter type as a result. The main advantages of such a filter over a serial one are much less electronic gate jitter and time delay for the same amount of total time uncertainty reduction. (orig.)
Implementations of BLAST for parallel computers.
Jülich, A
1995-02-01
The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Speedup predictions on large scientific parallel programs
Williams, E.; Bobrowicz, F.
1985-01-01
How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Language constructs for modular parallel programs
Foster, I.
1996-03-01
We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Massively parallel Fokker-Planck code ALLAp
Batishcheva, A.A.; Krasheninnikov, S.I.; Craddock, G.G.; Djordjevic, V.
1996-01-01
The recently developed for workstations Fokker-Planck code ALLA simulates the temporal evolution of 1V, 2V and 1D2V collisional edge plasmas. In this work we present the results of code parallelization on the CRI T3D massively parallel platform (ALLAp version). Simultaneously we benchmark the 1D2V parallel vesion against an analytic self-similar solution of the collisional kinetic equation. This test is not trivial as it demands a very strong spatial temperature and density variation within the simulation domain. (orig.)
D. C. Kent; Won Keun Min
2002-01-01
Neighborhood spaces, pretopological spaces, and closure spaces are topological space generalizations which can be characterized by means of their associated interior (or closure) operators. The category NBD of neighborhood spaces and continuous maps contains PRTOP as a bicoreflective subcategory and CLS as a bireflective subcategory, whereas TOP is bireflectively embedded in PRTOP and bicoreflectively embedded in CLS. Initial and final structures are described in these categories, and it is s...
A scalable method for parallelizing sampling-based motion planning algorithms
Jacobs, Sam Ade; Manavi, Kasra; Burgos, Juan; Denny, Jory; Thomas, Shawna; Amato, Nancy M.
2012-01-01
This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.
A scalable method for parallelizing sampling-based motion planning algorithms
Jacobs, Sam Ade
2012-05-01
This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.
Parallelization of Rocket Engine Simulator Software (PRESS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10
MCBooster: a tool for MC generation for massively parallel platforms
Alves Junior, Antonio Augusto
2016-01-01
MCBooster is a header-only, C++11-compliant library for the generation of large samples of phase-space Monte Carlo events on massively parallel platforms. It was released on GitHub in the spring of 2016. The library core algorithms implement the Raubold-Lynch method; they are able to generate the full kinematics of decays with up to nine particles in the final state. The library supports the generation of sequential decays as well as the parallel evaluation of arbitrary functions over the generated events. The output of MCBooster completely accords with popular and well-tested software packages such as GENBOD (W515 from CERNLIB) and TGenPhaseSpace from the ROOT framework. MCBooster is developed on top of the Thrust library and runs on Linux systems. It deploys transparently on NVidia CUDA-enabled GPUs as well as multicore CPUs. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of perfor...
Peyroux, J
2005-11-15
This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Peyroux, J
2005-11-15
This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena
Pankratius, V.; Gowanlock, M.; Blair, D. M.
2015-12-01
Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).
Massively Parallel Computing: A Sandia Perspective
Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.
1999-05-06
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Parallel generation of architecture on the GPU
Steinberger, Markus; Kenzel, Michael; Kainz, Bernhard K.; Mü ller, Jö rg; Wonka, Peter; Schmalstieg, Dieter
2014-01-01
they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies
New high voltage parallel plate analyzer
Hamada, Y.; Kawasumi, Y.; Masai, K.; Iguchi, H.; Fujisawa, A.; Abe, Y.
1992-01-01
A new modification on the parallel plate analyzer for 500 keV heavy ions to eliminate the effect of the intense UV and visible radiations, is successfully conducted. Its principle and results are discussed. (author)
Parallel data encryption with RSA algorithm
Неретин, А. А.
2016-01-01
In this paper a parallel RSA algorithm with preliminary shuffling of source text was presented.Dependence of an encryption speed on the number of encryption nodes has been analysed, The proposed algorithm was implemented on C# language.
Data parallel sorting for particle simulation
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Parallel debt in the Serbian finance law
Kuzman Miloš
2014-01-01
Full Text Available The purpose of this paper is to present the mechanism of parallel debt in the Serbian financial law. While considering whether the mechanism of parallel debt exists under the Serbian law, the Anglo-Saxon mechanism of trust is represented. Hence it is explained why the mechanism of trust is not allowed under the Serbian law. Further on, the mechanism of parallel debt is introduced as well as a debate on permissibility of its cause in the Serbian law. Comparative legal arguments about this issue are also presented in this paper. In conclusion, the author suggests that on the basis of the conclusions drawn in this paper, the parallel debt mechanism is to be declared admissible if it is ever taken into consideration by the Serbian courts.
Parallel Monte Carlo simulation of aerosol dynamics
Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.
2014-01-01
is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI
Stranger than fiction: parallel universes beguile science
2007-01-01
We may not be able - at least not yet - to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of effeaded imagination. (1/2 page)
Parallel computation of nondeterministic algorithms in VLSI
Hortensius, P D
1987-01-01
This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.
Adapting algorithms to massively parallel hardware
Sioulas, Panagiotis
2016-01-01
In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Implementing Shared Memory Parallelism in MCBEND
Bird Adam
2017-01-01
Full Text Available MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers’s ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Domain decomposition methods and parallel computing
Meurant, G.
1991-01-01
In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
6th International Parallel Tools Workshop
Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang
2013-01-01
The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
Parallel processor programs in the Federal Government
Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.
1985-01-01
In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
Density functional theory and parallel processing
Ward, R.C.; Geist, G.A.; Butler, W.H.
1987-01-01
The authors demonstrate a method for obtaining the ground state energies and charge densities of a system of atoms described within density functional theory using simulated annealing on a parallel computer
High performance parallel computers for science
Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.
1989-01-01
This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.
Catarinucci, Luca; Tarricone, Luciano
2009-01-01
The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
Unpacking the cognitive map: the parallel map theory of hippocampal function.
Jacobs, Lucia F; Schenk, Françoise
2003-04-01
In the parallel map theory, the hippocampus encodes space with 2 mapping systems. The bearing map is constructed primarily in the dentate gyrus from directional cues such as stimulus gradients. The sketch map is constructed within the hippocampus proper from positional cues. The integrated map emerges when data from the bearing and sketch maps are combined. Because the component maps work in parallel, the impairment of one can reveal residual learning by the other. Such parallel function may explain paradoxes of spatial learning, such as learning after partial hippocampal lesions, taxonomic and sex differences in spatial learning, and the function of hippocampal neurogenesis. By integrating evidence from physiology to phylogeny, the parallel map theory offers a unified explanation for hippocampal function.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
Hesse, M.; Birn, J.
1990-01-01
Properties of the electric field component parallel to the magnetic field are investigate in a 3D MHD simulation of plasmoid formation and evolution in the magnetotail, in the presence of a net dawn-dusk magnetic field component. The spatial localization of E-parallel, and the concept of a diffusion zone and the role of E-parallel in accelerating electrons are discussed. A localization of the region of enhanced E-parallel in all space directions is found, with a strong concentration in the z direction. This region is identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. 12 refs
Massively parallel evolutionary computation on GPGPUs
Tsutsui, Shigeyoshi
2013-01-01
Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
Freeman, Bryan
2013-01-01
This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.
Alternative derivation of the parallel ion viscosity
Bravenec, R.V.; Berk, H.L.; Hammer, J.H.
1982-01-01
A set of double-adiabatic fluid equations with additional collisional relaxation between the ion temperatures parallel and perpendicular to a magnetic field are shown to reduce to a set involving a single temperature and a parallel viscosity. This result is applied to a recently published paper [R. V. Bravenec, A. J. Lichtenberg, M. A. Leiberman, and H. L. Berk, Phys. Fluids 24, 1320 (1981)] on viscous flow in a multiple-mirror configuration
Acoustic simulation in architecture with parallel algorithm
Li, Xiaohong; Zhang, Xinrong; Li, Dan
2004-03-01
In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
PARALLEL SOLUTION METHODS OF PARTIAL DIFFERENTIAL EQUATIONS
Korhan KARABULUT
1998-03-01
Full Text Available Partial differential equations arise in almost all fields of science and engineering. Computer time spent in solving partial differential equations is much more than that of in any other problem class. For this reason, partial differential equations are suitable to be solved on parallel computers that offer great computation power. In this study, parallel solution to partial differential equations with Jacobi, Gauss-Siedel, SOR (Succesive OverRelaxation and SSOR (Symmetric SOR algorithms is studied.
Current distribution characteristics of superconducting parallel circuits
Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.
1994-01-01
In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process
Parallel processing of structural integrity analysis codes
Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.
1996-01-01
Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Adelstein, Pamela
2018-01-01
A space can be sacred, providing those who inhabit a particular space with sense of transcendence-being connected to something greater than oneself. The sacredness may be inherent in the space, as for a religious institution or a serene place outdoors. Alternatively, a space may be made sacred by the people within it and events that occur there. As medical providers, we have the opportunity to create sacred space in our examination rooms and with our patient interactions. This sacred space can be healing to our patients and can bring us providers opportunities for increased connection, joy, and gratitude in our daily work.
Adams, Robert A
2003-01-01
Sobolev Spaces presents an introduction to the theory of Sobolev Spaces and other related spaces of function, also to the imbedding characteristics of these spaces. This theory is widely used in pure and Applied Mathematics and in the Physical Sciences.This second edition of Adam''s ''classic'' reference text contains many additions and much modernizing and refining of material. The basic premise of the book remains unchanged: Sobolev Spaces is intended to provide a solid foundation in these spaces for graduate students and researchers alike.* Self-contained and accessible for readers in other disciplines.* Written at elementary level making it accessible to graduate students.
Sorenson, Jonathan P.
2010-01-01
We extend the known tables of pseudosquares and pseudocubes, discuss the implications of these new data on the conjectured distribution of pseudosquares and pseudocubes, and present the details of the algorithm used to do this work. Our algorithm is based on the space-saving wheel data structure combined with doubly-focused enumeration, run in parallel on a cluster supercomputer.
Massively-parallel best subset selection for ordinary least-squares regression
Gieseke, Fabian; Polsterer, Kai Lars; Mahabal, Ashish
2017-01-01
Selecting an optimal subset of k out of d features for linear regression models given n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massively-parallel implementation for selecting such optimal feature...
Diffraction of love waves by two parallel perfectly weak half planes
Asghar, S.; Zaman, F.D.; Ayub, M.
1986-04-01
We consider the diffraction of Love waves by two parallel perfectly weak half planes in a layer overlying a half space. The problem is formulated in terms of the Wiener-Hopf equations in the transformed plane. The transmitted waves are then calculated using the Wiener-Hopf procedure and inverse transforms. (author)
A parallel FE-FV scheme to solve fluid flow in complex geologic media
Coumou, Dim; Matthäi, Stephan; Geiger, Sebastian; Driesner, Thomas
2008-01-01
Field data-based simulations of geologic systems require much computational time because of their mathematical complexity and the often desired large scales in space and time. To conduct accurate simulations in an acceptable time period, methods to reduce runtime are required. A parallelization
Sparse Probabilistic Parallel Factor Analysis for the Modeling of PET and Task-fMRI Data
Beliveau, Vincent; Papoutsakis, Georgios; Hinrich, Jesper Løve
2017-01-01
Modern datasets are often multiway in nature and can contain patterns common to a mode of the data (e.g. space, time, and subjects). Multiway decomposition such as parallel factor analysis (PARAFAC) take into account the intrinsic structure of the data, and sparse versions of these methods improv...
Modelling and simulation of multiple single - phase induction motor in parallel connection
Sujitjorn, S.
2006-11-01
Full Text Available A mathematical model for parallel connected n-multiple single-phase induction motors in generalized state-space form is proposed in this paper. The motor group draws electric power from one inverter. The model is developed by the dq-frame theory and was tested against four loading scenarios in which satisfactory results were obtained.
Harmonic analysis on symmetric spaces
Terras, Audrey
This text explores the geometry and analysis of higher rank analogues of the symmetric spaces introduced in volume one. To illuminate both the parallels and differences of the higher rank theory, the space of positive matrices is treated in a manner mirroring that of the upper-half space in volume one. This concrete example furnishes motivation for the general theory of noncompact symmetric spaces, which is outlined in the final chapter. The book emphasizes motivation and comprehensibility, concrete examples and explicit computations (by pen and paper, and by computer), history, and, above all, applications in mathematics, statistics, physics, and engineering. The second edition includes new sections on Donald St. P. Richards’s central limit theorem for O(n)-invariant random variables on the symmetric space of GL(n, R), on random matrix theory, and on advances in the theory of automorphic forms on arithmetic groups.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis
Choudhary, Alok Nidhi
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Through the lens of a space tourist
Julia Tcharfas
2015-11-01
Full Text Available This essay attempts to contextualise the experience and documentation of the world’s first space tourist, a multi-millionaire American businessman Dennis Tito, who vacationed on the International Space Station in 2001. The essay brings together two parallel elements of this historical event: the political transformation of the Russian space programme which made the private flight possible and the cultural significance and impact of the event. The first space tourist is both a direct product of the newly commercialised space programme and a reflection of a new worldview, with new values and expectations.
Concurrent computation of attribute filters on shared memory parallel machines
Wilkinson, Michael H.F.; Gao, Hui; Hesselink, Wim H.; Jonker, Jan-Eppo; Meijster, Arnold
2008-01-01
Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings, and thickenings,
A task parallel implementation of fast multipole methods
Taura, Kenjiro; Nakashima, Jun; Yokota, Rio; Maruyama, Naoya
2012-01-01
This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM
Parallel phase model : a programming model for high-end parallel machines with manycores.
Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian
2009-04-01
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
He, H.-Q.; Wan, W.
2012-01-01
The parallel mean free path of solar energetic particles (SEPs), which is determined by physical properties of SEPs as well as those of solar wind, is a very important parameter in space physics to study the transport of charged energetic particles in the heliosphere, especially for space weather forecasting. In space weather practice, it is necessary to find a quick approach to obtain the parallel mean free path of SEPs for a solar event. In addition, the adiabatic focusing effect caused by a spatially varying mean magnetic field in the solar system is important to the transport processes of SEPs. Recently, Shalchi presented an analytical description of the parallel diffusion coefficient with adiabatic focusing. Based on Shalchi's results, in this paper we provide a direct analytical formula as a function of parameters concerning the physical properties of SEPs and solar wind to directly and quickly determine the parallel mean free path of SEPs with adiabatic focusing. Since all of the quantities in the analytical formula can be directly observed by spacecraft, this direct method would be a very useful tool in space weather research. As applications of the direct method, we investigate the inherent relations between the parallel mean free path and various parameters concerning physical properties of SEPs and solar wind. Comparisons of parallel mean free paths with and without adiabatic focusing are also presented.
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Parallelization of Subchannel Analysis Code MATRA
Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk
2014-01-01
A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems
Improvement of Parallel Algorithm for MATRA Code
Kim, Seong-Jin; Seo, Kyong-Won; Kwon, Hyouk; Hwang, Dae-Hyun
2014-01-01
The feasibility study to parallelize the MATRA code was conducted in KAERI early this year. As a result, a parallel algorithm for the MATRA code has been developed to decrease a considerably required computing time to solve a bigsize problem such as a whole core pin-by-pin problem of a general PWR reactor and to improve an overall performance of the multi-physics coupling calculations. It was shown that the performance of the MATRA code was greatly improved by implementing the parallel algorithm using MPI communication. For problems of a 1/8 core and whole core for SMART reactor, a speedup was evaluated as about 10 when the numbers of used processor were 25. However, it was also shown that the performance deteriorated as the axial node number increased. In this paper, the procedure of a communication between processors is optimized to improve the previous parallel algorithm.. To improve the performance deterioration of the parallelized MATRA code, the communication algorithm between processors was newly presented. It was shown that the speedup was improved and stable regardless of the axial node number
Iteration schemes for parallelizing models of superconductivity
Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)
1996-12-31
The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
2005-01-01
Digital technologies and media are becoming increasingly embodied and entangled in the spaces and places at work and at home. However, our material environment is more than a geometric abstractions of space: it contains familiar places, social arenas for human action. For designers, the integration...... of digital technology with space poses new challenges that call for new approaches. Creative alternatives to traditional systems methodologies are called for when designers use digital media to create new possibilities for action in space. Design Spaces explores how design and media art can provide creative...... alternatives for integrating digital technology with space. Connecting practical design work with conceptual development and theorizing, art with technology, and usesr-centered methods with social sciences, Design Spaces provides a useful research paradigm for designing ubiquitous computing. This book...
Parallel visualization on leadership computing resources
Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)
2009-07-01
Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Parallelization of ITOUGH2 using PVM
Finsterle, Stefan
1998-01-01
ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
Distributed Parallel Architecture for "Big Data"
Catalin BOJA
2012-01-01
Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.
Java parallel secure stream for grid computing
Chen, J.; Akers, W.; Chen, Y.; Watson, W.
2001-01-01
The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed
Applications of Parallel Processing in Mobile Banking
2007-01-01
Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Parallel computational in nuclear group constant calculation
Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal
2002-01-01
In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed
Abstract Level Parallelization of Finite Difference Methods
Edwin Vollebregt
1997-01-01
Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Parallel visualization on leadership computing resources
Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H
2009-01-01
Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Analysis of gamma irradiator dose rate using spent fuel elements with parallel configuration
Setiyanto; Pudjijanto MS; Ardani
2006-01-01
To enhance the utilization of the RSG-GAS reactor spent fuel, the gamma irradiator using spent fuel elements as a gamma source is a suitable choice. This irradiator can be used for food sterilization and preservation. The first step before realization, it is necessary to determine the gamma dose rate theoretically. The assessment was realized for parallel configuration fuel elements with the irradiation space can be placed between fuel element series. This analysis of parallel model was choice to compare with the circle model and as long as possible to get more space for irradiation and to do manipulation of irradiation target. Dose rate calculation were done with MCNP, while the estimation of gamma activities of fuel element was realized by OREGEN code with 1 year of average delay time. The calculation result show that the gamma dose rate of parallel model decreased up to 50% relatively compared with the circle model, but the value still enough for sterilization and preservation. Especially for food preservation, this parallel model give more flexible, while the gamma dose rate can be adjusted to the irradiation needed. The conclusion of this assessment showed that the utilization of reactor spent fuels for gamma irradiator with parallel model give more advantage the circle model. (author)
A possibility of parallel and anti-parallel diffraction measurements on ...
However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the / measured ...
Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi
2000-03-01
Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
A SPECT reconstruction method for extending parallel to non-parallel geometries
Wen Junhai; Liang Zhengrong
2010-01-01
Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
Programming massively parallel processors a hands-on approach
Kirk, David B
2010-01-01
Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Parallelization of Reversible Ripple-carry Adders
Thomsen, Michael Kirkedal; Axelsen, Holger Bock
2009-01-01
The design of fast arithmetic logic circuits is an important research topic for reversible and quantum computing. A special challenge in this setting is the computation of standard arithmetical functions without the generation of \\emph{garbage}. Here, we present a novel parallelization scheme...... wherein $m$ parallel $k$-bit reversible ripple-carry adders are combined to form a reversible $mk$-bit \\emph{ripple-block carry adder} with logic depth $\\mathcal{O}(m+k)$ for a \\emph{minimal} logic depth $\\mathcal{O}(\\sqrt{mk})$, thus improving on the $mk$-bit ripple-carry adder logic depth $\\mathcal...
Parallel algorithms for numerical linear algebra
van der Vorst, H
1990-01-01
This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p
Keldysh formalism for multiple parallel worlds
Ansari, M.; Nazarov, Y. V.
2016-01-01
We present a compact and self-contained review of the recently developed Keldysh formalism for multiple parallel worlds. The formalism has been applied to consistent quantum evaluation of the flows of informational quantities, in particular, to the evaluation of Renyi and Shannon entropy flows. We start with the formulation of the standard and extended Keldysh techniques in a single world in a form convenient for our presentation. We explain the use of Keldysh contours encompassing multiple parallel worlds. In the end, we briefly summarize the concrete results obtained with the method.
Keldysh formalism for multiple parallel worlds
Ansari, M.; Nazarov, Y. V.
2016-03-01
We present a compact and self-contained review of the recently developed Keldysh formalism for multiple parallel worlds. The formalism has been applied to consistent quantum evaluation of the flows of informational quantities, in particular, to the evaluation of Renyi and Shannon entropy flows. We start with the formulation of the standard and extended Keldysh techniques in a single world in a form convenient for our presentation. We explain the use of Keldysh contours encompassing multiple parallel worlds. In the end, we briefly summarize the concrete results obtained with the method.
A Massively Parallel Face Recognition System
Lahdenoja Olli
2007-01-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
Xyce parallel electronic simulator release notes.
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
Parallel transposition of sparse data structures
Wang, Hao; Liu, Weifeng; Hou, Kaixi
2016-01-01
Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Temporal fringe pattern analysis with parallel computing
Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca
2005-01-01
Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis
On radial flow between parallel disks
Wee, A Y L; Gorin, A
2015-01-01
Approximate analytical solutions are presented for converging flow in between two parallel non rotating disks. The static pressure distribution and radial component of the velocity are developed by averaging the inertial term across the gap in between parallel disks. The predicted results from the first approximation are favourable to experimental results as well as results presented by other authors. The second approximation shows that as the fluid approaches the center, the velocity at the mid channel slows down which is due to the struggle between the inertial term and the flowrate. (paper)
Logical inference techniques for loop parallelization
Oancea, Cosmin Eugen; Rauchwerger, Lawrence
2012-01-01
the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers....
Analysis of a parallel multigrid algorithm
Chan, Tony F.; Tuminaro, Ray S.
1989-01-01
The parallel multigrid algorithm of Frederickson and McBryan (1987) is considered. This algorithm uses multiple coarse-grid problems (instead of one problem) in the hope of accelerating convergence and is found to have a close relationship to traditional multigrid methods. Specifically, the parallel coarse-grid correction operator is identical to a traditional multigrid coarse-grid correction operator, except that the mixing of high and low frequencies caused by aliasing error is removed. Appropriate relaxation operators can be chosen to take advantage of this property. Comparisons between the standard multigrid and the new method are made.
Parallel processing for artificial intelligence 2
Kumar, V; Suttner, CB
1994-01-01
With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
Configuration affects parallel stent grafting results.
Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L
2018-05-01
A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31
Keldysh formalism for multiple parallel worlds
Ansari, M.; Nazarov, Y. V., E-mail: y.v.nazarov@tudelft.nl [Delft University of Technology, Kavli Institute of Nanoscience (Netherlands)
2016-03-15
We present a compact and self-contained review of the recently developed Keldysh formalism for multiple parallel worlds. The formalism has been applied to consistent quantum evaluation of the flows of informational quantities, in particular, to the evaluation of Renyi and Shannon entropy flows. We start with the formulation of the standard and extended Keldysh techniques in a single world in a form convenient for our presentation. We explain the use of Keldysh contours encompassing multiple parallel worlds. In the end, we briefly summarize the concrete results obtained with the method.
Use of parallel counters for triggering
Nikityuk, N.M.
1991-01-01
Results of investigation of using parallel counters, majority coincidence schemes, parallel compressors for triggering in multichannel high energy spectrometers are described. Concrete examples of methods of constructing fast and economic new devices used to determine multiplicity hits t>900 registered in a hodoscopic plane and a pixel detector are given. For this purpose the author uses the syndrome coding method and cellular arrays. In addition, an effective coding matrix has been created which can be used for light signal coding. For example, such signals are supplied from scintillators to photomultipliers. 23 refs.; 21 figs
A Massively Parallel Face Recognition System
Ari Paasio
2006-12-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
Parallel processor for fast event analysis
Hensley, D.C.
1983-01-01
Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system
Parallel adaptive simulations on unstructured meshes
Shephard, M S; Jansen, K E; Sahni, O; Diachin, L A
2007-01-01
This paper discusses methods being developed by the ITAPS center to support the execution of parallel adaptive simulations on unstructured meshes. The paper first outlines the ITAPS approach to the development of interoperable mesh, geometry and field services to support the needs of SciDAC application in these areas. The paper then demonstrates the ability of unstructured adaptive meshing methods built on such interoperable services to effectively solve important physics problems. Attention is then focused on ITAPs' developing ability to solve adaptive unstructured mesh problems on massively parallel computers
Structured building model reduction toward parallel simulation
Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University
2013-08-26
Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
Parallel preconditioning techniques for sparse CG solvers
Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-11-12
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Martin, Gary L.
2011-01-01
A robust and competitive commercial space sector is vital to continued progress in space. The United States is committed to encouraging and facilitating the growth of a U.S. commercial space sector that supports U.S. needs, is globally competitive, and advances U.S. leadership in the generation of new markets and innovation-driven entrepreneurship. Energize competitive domestic industries to participate in global markets and advance the development of: satellite manufacturing; satellite-based services; space launch; terrestrial applications; and increased entrepreneurship. Purchase and use commercial space capabilities and services to the maximum practical extent Actively explore the use of inventive, nontraditional arrangements for acquiring commercial space goods and services to meet United States Government requirements, including measures such as public-private partnerships, . Refrain from conducting United States Government space activities that preclude, discourage, or compete with U.S. commercial space activities. Pursue potential opportunities for transferring routine, operational space functions to the commercial space sector where beneficial and cost-effective.
Anghaie, S.
2007-01-01
The development of space nuclear power and propulsion in the United States started in 1955 with the initiation of the ROVER project. The first step in the ROVER program was the KIWI project that included the development and testing of 8 non-flyable ultrahigh temperature nuclear test reactors during 1955-1964. The KIWI project was precursor to the PHOEBUS carbon-based fuel reactor project that resulted in ground testing of three high power reactors during 1965-1968 with the last reactor operated at 4,100 MW. During the same time period a parallel program was pursued to develop a nuclear thermal rocket based on cermet fuel technology. The third component of the ROVER program was the Nuclear Engine for Rocket Vehicle Applications (NERVA) that was initiated in 1961 with the primary goal of designing the first generation of nuclear rocket engine based on the KIWI project experience. The fourth component of the ROVER program was the Reactor In-Flight Test (RIFT) project that was intended to design, fabricate, and flight test a NERVA powered upper stage engine for the Saturn-class lunch vehicle. During the ROVER program era, the Unites States ventured in a comprehensive space nuclear program that included design and testing of several compact reactors and space suitable power conversion systems, and the development of a few light weight heat rejection systems. Contrary to its sister ROVER program, the space nuclear power program resulted in the first ever deployment and in-space operation of the nuclear powered SNAP-10A in 1965. The USSR space nuclear program started in early 70's and resulted in deployment of two 6 kWe TOPAZ reactors into space and ground testing of the prototype of a relatively small nuclear rocket engine in 1984. The US ambition for the development and deployment of space nuclear powered systems was resurrected in mid 1980's and intermittently continued to date with the initiation of several research programs that included the SP-100, Space Exploration