core parallels climate: Topics by WorldWideScience.org

Sample records for core parallels climate

The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

Directory of Open Access Journals (Sweden)

Ashkan Tousimojarad

2013-12-01

Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.
Adaptive query parallelization in multi-core column stores

NARCIS (Netherlands)

M.M. Gawade (Mrunal); M.L. Kersten (Martin); M.M. Gawade (Mrunal); M.L. Kersten (Martin)

2016-01-01

htmlabstractWith the rise of multi-core CPU platforms, their optimal utilization for in-memory OLAP workloads using column store databases has become one of the biggest challenges. Some of the inherent limi- tations in the achievable query parallelism are due to the degree of parallelism
On the effective parallel programming of multi-core processors

NARCIS (Netherlands)

Varbanescu, A.L.

2010-01-01

Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are
Hydraulic Profiling of a Parallel Channel Type Reactor Core

International Nuclear Information System (INIS)

Seo, Kyong-Won; Hwang, Dae-Hyun; Lee, Chung-Chan

2006-01-01

An advanced reactor core which consisted of closed multiple parallel channels was optimized to maximize the thermal margin of the core. The closed multiple parallel channel configurations have different characteristics to the open channels of conventional PWRs. The channels, usually assemblies, are isolated hydraulically from each other and there is no cross flow between channels. The distribution of inlet flow rate between channels is a very important design parameter in the core because distribution of inlet flow is directly proportional to a margin for a certain hydraulic parameter. The thermal hydraulic parameter may be the boiling margin, maximum fuel temperature, and critical heat flux. The inlet flow distribution of the core was optimized for the boiling margins by grouping the inlet orifices by several hydraulic regions. The procedure is called a hydraulic profiling
Peformance Tuning and Evaluation of a Parallel Community Climate Model

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Worley, P.H.; Hammond, S.

1999-11-13

The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.
Implementation of a parallel version of a regional climate model

Energy Technology Data Exchange (ETDEWEB)

Gerstengarbe, F.W. [ed.; Kuecken, M. [Potsdam-Institut fuer Klimafolgenforschung (PIK), Potsdam (Germany); Schaettler, U. [Deutscher Wetterdienst, Offenbach am Main (Germany). Geschaeftsbereich Forschung und Entwicklung

1997-10-01

A regional climate model developed by the Max Planck Institute for Meterology and the German Climate Computing Centre in Hamburg based on the `Europa` and `Deutschland` models of the German Weather Service has been parallelized and implemented on the IBM RS/6000 SP computer system of the Potsdam Institute for Climate Impact Research including parallel input/output processing, the explicit Eulerian time-step, the semi-implicit corrections, the normal-mode initialization and the physical parameterizations of the German Weather Service. The implementation utilizes Fortran 90 and the Message Passing Interface. The parallelization strategy used is a 2D domain decomposition. This report describes the parallelization strategy, the parallel I/O organization, the influence of different domain decomposition approaches for static and dynamic load imbalances and first numerical results. (orig.)
A global database with parallel measurements to study non-climatic changes

Science.gov (United States)

Venema, Victor; Auchmann, Renate; Aguilar, Enric; Auer, Ingeborg; Azorin-Molina, Cesar; Brandsma, Theo; Brunetti, Michele; Dienst, Manuel; Domonkos, Peter; Gilabert, Alba; Lindén, Jenny; Milewska, Ewa; Nordli, Øyvind; Prohom, Marc; Rennie, Jared; Stepanek, Petr; Trewin, Blair; Vincent, Lucie; Willett, Kate; Wolff, Mareile

2016-04-01

In this work we introduce the rationale behind the ongoing compilation of a parallel measurements database, in the framework of the International Surface Temperatures Initiative (ISTI) and with the support of the World Meteorological Organization. We intend this database to become instrumental for a better understanding of inhomogeneities affecting the evaluation of long-term changes in daily climate data. Long instrumental climate records are usually affected by non-climatic changes, due to, e.g., (i) station relocations, (ii) instrument height changes, (iii) instrumentation changes, (iv) observing environment changes, (v) different sampling intervals or data collection procedures, among others. These so-called inhomogeneities distort the climate signal and can hamper the assessment of long-term trends and variability of climate. Thus to study climatic changes we need to accurately distinguish non-climatic and climatic signals. The most direct way to study the influence of non-climatic changes on the distribution and to understand the reasons for these biases is the analysis of parallel measurements representing the old and new situation (in terms of e.g. instruments, location, different radiation shields, etc.). According to the limited number of available studies and our understanding of the causes of inhomogeneity, we expect that they will have a strong impact on the tails of the distribution of air temperatures and most likely of other climate elements. Our abilities to statistically homogenize daily data will be increased by systematically studying different causes of inhomogeneity replicated through parallel measurements. Current studies of non-climatic changes using parallel data are limited to local and regional case studies. However, the effect of specific transitions depends on the local climate and the most interesting climatic questions are about the systematic large-scale biases produced by transitions that occurred in many regions. Important
Multi-core parallelism in a column-store

NARCIS (Netherlands)

Gawade, M.M.

2017-01-01

The research reported in this thesis addresses several challenges of improving the efficiency and effectiveness of parallel processing of analytical database queries on modern multi- and many-core systems, using an open-source column-oriented analytical database management system, MonetDB, for
Parallelization of a three-dimensional whole core transport code DeCART

Energy Technology Data Exchange (ETDEWEB)

Jin Young, Cho; Han Gyu, Joo; Ha Yong, Kim; Moon-Hee, Chang [Korea Atomic Energy Research Institute, Yuseong-gu, Daejon (Korea, Republic of)

2003-07-01

Parallelization of the DeCART (deterministic core analysis based on ray tracing) code is presented that reduces the computational burden of the tremendous computing time and memory required in three-dimensional whole core transport calculations. The parallelization employs the concept of MPI grouping and the MPI/OpenMP mixed scheme as well. Since most of the computing time and memory are used in MOC (method of characteristics) and the multi-group CMFD (coarse mesh finite difference) calculation in DeCART, variables and subroutines related to these two modules are the primary targets for parallelization. Specifically, the ray tracing module was parallelized using a planar domain decomposition scheme and an angular domain decomposition scheme. The parallel performance of the DeCART code is evaluated by solving a rodded variation of the C5G7MOX three dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In C5G7MOX problem with 24 CPUs, a speedup of maximum 21 is obtained on an IBM Regatta machine and 22 on a LINUX Cluster in the MOC kernel, which indicates good parallel performance of the DeCART code. In the simplified SMART problem, the memory requirement of about 11 GBytes in the single processor cases reduces to 940 Mbytes with 24 processors, which means that the DeCART code can now solve large core problems with affordable LINUX clusters. (authors)
Highly parallel line-based image coding for many cores.

Science.gov (United States)

Peng, Xiulian; Xu, Jizheng; Zhou, You; Wu, Feng

2012-01-01

Computers are developing along with a new trend from the dual-core and quad-core processors to ones with tens or even hundreds of cores. Multimedia, as one of the most important applications in computers, has an urgent need to design parallel coding algorithms for compression. Taking intraframe/image coding as a start point, this paper proposes a pure line-by-line coding scheme (LBLC) to meet the need. In LBLC, an input image is processed line by line sequentially, and each line is divided into small fixed-length segments. The compression of all segments from prediction to entropy coding is completely independent and concurrent at many cores. Results on a general-purpose computer show that our scheme can get a 13.9 times speedup with 15 cores at the encoder and a 10.3 times speedup at the decoder. Ideally, such near-linear speeding relation with the number of cores can be kept for more than 100 cores. In addition to the high parallelism, the proposed scheme can perform comparatively or even better than the H.264 high profile above middle bit rates. At near-lossless coding, it outperforms H.264 more than 10 dB. At lossless coding, up to 14% bit-rate reduction is observed compared with H.264 lossless coding at the high 4:4:4 profile.
Parallelization characteristics of a three-dimensional whole-core code DeCART

International Nuclear Information System (INIS)

Cho, J. Y.; Joo, H.K.; Kim, H. Y.; Lee, J. C.; Jang, M. H.

2003-01-01

Neutron transport calculation for three-dimensional amount of computing time but also huge memory. Therefore, whole-core codes such as DeCART need both also parallel computation and distributed memory capabilities. This paper is to implement such parallel capabilities based on MPI grouping and memory distribution on the DeCART code, and then to evaluate the performance by solving the C5G7 three-dimensional benchmark and a simplified three-dimensional SMART core problem. In C5G7 problem with 24 CPUs, a speedup of maximum 22 is obtained on IBM regatta machine and 21 on a LINUX cluster for the MOC kernel, which indicates good parallel performance of the DeCART code. The simplified SMART problem which need about 11 GBytes memory with one processors requires about 940 MBytes, which means that the DeCART code can now solve large core problems on affordable LINUX clusters
Integral manifolding structure for fuel cell core having parallel gas flow

Science.gov (United States)

Herceg, Joseph E.

1984-01-01

Disclosed herein are manifolding means for directing the fuel and oxidant gases to parallel flow passageways in a fuel cell core. Each core passageway is defined by electrolyte and interconnect walls. Each electrolyte and interconnect wall consists respectively of anode and cathode materials layered on the opposite sides of electrolyte material, or on the opposite sides of interconnect material. A core wall projects beyond the open ends of the defined core passageways and is disposed approximately midway between and parallel to the adjacent overlaying and underlying interconnect walls to define manifold chambers therebetween on opposite sides of the wall. Each electrolyte wall defining the flow passageways is shaped to blend into and be connected to this wall in order to redirect the corresponding fuel and oxidant passageways to the respective manifold chambers either above or below this intermediate wall. Inlet and outlet connections are made to these separate manifold chambers respectively, for carrying the fuel and oxidant gases to the core, and for carrying their reaction products away from the core.
First results from core-edge parallel composition in the FACETS project.

Energy Technology Data Exchange (ETDEWEB)

Cary, J. R.; Candy, J.; Cohen, R. H.; Krasheninnikov, S.; McCune, D. C.; Estep, D. J.; Larson, J.; Malony, A. D.; Pankin, A.; Worley, P. H.; Carlsson, J. A.; Hakim, A. H.; Hamill, P.; Kruger, S.; Miah, M.; Muzsala, S.; Pletzer, A.; Shasharina, S.; Wade-Stein, D.; Wang, N.; Balay, S.; McInnes, L.; Zhang, H.; Casper, T.; Diachin, L. (Mathematics and Computer Science); (Tech-X Corp.); (General Atomics); (LLNL); (Univ. of California at San Diego); (Princeton Plasma Physics Lab.); (Colorado State Univ.); (ParaTools Inc.); (Lehigh Univ.); (ORNL)

2008-01-01

FACETS (Framework Application for Core-Edge Transport Simulations), now in its second year, has achieved its first coupled core-edge transport simulations. In the process, a number of accompanying accomplishments were achieved. These include a new parallel core component, a new wall component, improvements in edge and source components, and the framework for coupling all of this together. These accomplishments were a result of an interdisciplinary collaboration among computational physics, computer scientists, and applied mathematicians on the team.
First results from core-edge parallel composition in the FACETS project

Energy Technology Data Exchange (ETDEWEB)

Cary, J R; Carlsson, J A; Hakim, A H; Hamill, P; Kruger, S; Miah, M; Muzsala, S; Pletzer, A; Shasharina, S; Wade-Stein, D; Wang, N [Tech-X Corporation, Boulder, CO 80303 (United States); Candy, J [General Atomics, San Diego, CA 92186 (United States); Cohen, R H [Lawrence Livermore National Laboratory, Livermore, CA 94550 (United States); Krasheninnikov, S [University of California at San Diego, San Diego, CA 92093 (United States); McCune, D C [Princeton Plasma Physics Laboratory, Princeton, NJ 08543 (United States); Estep, D J [Colorado State University, Fort Collins, CO 80523 (United States); Larson, J [Argonne National Laboratory, Argonne, IL 60439 (United States); Malony, A D [ParaTools, Inc., Eugene, OR 97405 (United States); Pankin, A [Lehigh University, Bethlehem, PA 18015 (United States); Worley, P H [Oak Ridge National Laboratory, Oak Ridge, TN 37831 (United States)], E-mail: cary@txcorp.com (and others)

2008-07-15

FACETS (Framework Application for Core-Edge Transport Simulations), now in its second year, has achieved its first coupled core-edge transport simulations. In the process, a number of accompanying accomplishments were achieved. These include a new parallel core component, a new wall component, improvements in edge and source components, and the framework for coupling all of this together. These accomplishments were a result of an interdisciplinary collaboration among computational physics, computer scientists, and applied mathematicians on the team.
First results from core-edge parallel composition in the FACETS project

Energy Technology Data Exchange (ETDEWEB)

Cary, John R. [Tech-X Corporation; Candy, Jeff [General Atomics; Cohen, Ronald H. [Lawrence Livermore National Laboratory (LLNL); Krasheninnikov, Sergei [University of California, San Diego; McCune, Douglas [Princeton Plasma Physics Laboratory (PPPL); Estep, Donald J [Colorado State University, Fort Collins; Larson, Jay [Argonne National Laboratory (ANL); Malony, Allen [University of Oregon; Pankin, A. [Lehigh University, Bethlehem, PA; Worley, Patrick H [ORNL; Carlsson, Johann [Tech-X Corporation; Hakim, A H [Tech-X Corporation; Hamill, P [Tech-X Corporation; Kruger, Scott [Tech-X Corporation; Miah, Mahmood [Tech-X Corporation; Muzsala, S [Tech-X Corporation; Pletzer, Alexander [Tech-X Corporation; Shasharina, Svetlana [Tech-X Corporation; Wade-Stein, D [Tech-X Corporation; Wang, N [Tech-X Corporation; Balay, Satish [Argonne National Laboratory (ANL); McInnes, Lois [Argonne National Laboratory (ANL); Zhang, Hong [Argonne National Laboratory (ANL); Casper, T. A. [Lawrence Livermore National Laboratory (LLNL); Diachin, Lori [Lawrence Livermore National Laboratory (LLNL); Epperly, Thomas [Lawrence Livermore National Laboratory (LLNL); Rognlien, T. D. [Lawrence Livermore National Laboratory (LLNL); Fahey, Mark R [ORNL; Cobb, John W [ORNL; Morris, A [University of Oregon; Shende, Sameer [University of Oregon; Hammett, Greg [Princeton Plasma Physics Laboratory (PPPL); Indireshkumar, K [Tech-X Corporation; Stotler, D. [Princeton Plasma Physics Laboratory (PPPL); Pigarov, A [University of California, San Diego

2008-01-01

FACETS (Framework Application for Core-Edge Transport Simulations), now in its second year, has achieved its first coupled core-edge transport simulations. In the process, a number of accompanying accomplishments were achieved. These include a new parallel core component, a new wall component, improvements in edge and source components, and the framework for coupling all of this together. These accomplishments were a result of an interdisciplinary collaboration among computational physics, computer scientists, and applied mathematicians on the team.
Multilevel parallel strategy on Monte Carlo particle transport for the large-scale full-core pin-by-pin simulations

International Nuclear Information System (INIS)

Zhang, B.; Li, G.; Wang, W.; Shangguan, D.; Deng, L.

2015-01-01

This paper introduces the Strategy of multilevel hybrid parallelism of JCOGIN Infrastructure on Monte Carlo Particle Transport for the large-scale full-core pin-by-pin simulations. The particle parallelism, domain decomposition parallelism and MPI/OpenMP parallelism are designed and implemented. By the testing, JMCT presents the parallel scalability of JCOGIN, which reaches the parallel efficiency 80% on 120,000 cores for the pin-by-pin computation of the BEAVRS benchmark. (author)
Climatic Changes on Tibetan Plateau Based on Ice Core Records

Science.gov (United States)

Yao, T.

2008-12-01

Climatic changes have been reconstructed for the Tibetan Plateau based on ice core records. The Guliya ice core on the Tibetan Plateau presents climatic changes in the past 100,000 years, thus is comparative with that from Vostok ice core in Antarctica and GISP2 record in Arctic. These three records share an important common feature, i.e., our climate is not stable. It is also evident that the major patterns of climatic changes are similar on the earth. Why does climatic change over the earth follow a same pattern? It might be attributed to solar radiation. We found that the cold periods correspond to low insolation periods, and warm periods to high insolation periods. We found abrupt climatic change in the ice core climatic records, which presented dramatic temperature variation of as much as 10 °C in 50 or 60 years. Our major challenge in the study of both climate and environment is that greenhouse gases such as CO2, CH4 are possibly amplifying global warming, though at what degree remains unclear. One of the ways to understand the role of greenhouse gases is to reconstruct the past greenhouse gases recorded in ice. In 1997, we drilled an ice core from 7100 m a.s.l. in the Himalayas to reconstruct methane record. Based on the record, we found seasonal cycles in methane variation. In particular, the methane concentration is high in summer, suggestiing active methane emission from wet land in summer. Based on the seasonal cycle, we can reconstruct the methane fluctuation history in the past 500 years. The most prominent feature of the methane record in the Himalayan ice core is the abrupt increase since 1850 A.D.. This is closely related to the industrial revolution worldwide. We can also observe sudden decrease in methane concentration during the World War I and World War II. It implies that the industrial revolution has dominated the atmospheric greenhouse gas emission for about 100 years. Besides, the average methane concentration in the Himalayan ice core is
OS and Runtime Support for Efficiently Managing Cores in Parallel Applications

OpenAIRE

Klues, Kevin Alan

2015-01-01

Parallel applications can benefit from the ability to explicitly control their thread scheduling policies in user-space. However, modern operating systems lack the interfaces necessary to make this type of “user-level” scheduling efficient. The key component missing is the ability for applications to gain direct access to cores and keep control of those cores even when making I/O operations that traditionally block in the kernel. A number of former systems provided limited support for these c...
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

Science.gov (United States)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking
GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops

DEFF Research Database (Denmark)

Swenson, M Shel; Anderson, Joshua; Ash, Andrew

2012-01-01

achieved significant improvements in runtime, but their implementations were not portable from niche high-performance computers or easily accessible to most RNA researchers. With the increasing prevalence of multi-core desktop machines, a new parallel prediction program is needed to take full advantage...

Parallel structures for disaster risk reduction and climate change adaptation in Southern Africa

Directory of Open Access Journals (Sweden)

Per Becker

2013-01-01

Full Text Available During the last decade, the interest of the international community in the concepts of disaster risk reduction and climate change adaptation has been growing immensely. Even though an increasing number of scholars seem to view these concepts as two sides of the same coin (at least when not considering the potentially positive effects of climate change, in practice the two concepts have developed in parallel rather than in an integrated manner when it comes to policy, rhetoric and funding opportunities amongst international organisations and donors. This study investigates the extent of the creation of parallel structures for disaster risk reduction and climate change adaptation in the Southern African Development Community (SADC region. The chosen methodology for the study is a comparative case study and the data are collected through focus groups and content analysis of documentary sources, as well as interviews with key informants. The results indicate that parallel structures for disaster risk reduction and climate change adaptation have been established in all but one of the studied countries. The qualitative interviews performed in some of the countries indicate that stakeholders in disaster risk reduction view this duplication of structures as unfortunate, inefficient and a fertile setup for conflict over resources for the implementation of similar activities. Additional research is called for in order to study the concrete effects of having these parallel structures as a foundation for advocacy for more efficient future disaster risk reduction and climate change adaptation.
Parallel analysis tools and new visualization techniques for ultra-large climate data set

Energy Technology Data Exchange (ETDEWEB)

Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)

2014-12-10

ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.
Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

International Nuclear Information System (INIS)

Ohshima, Hiroyuki

2001-10-01

A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)
Paleoclimate from ice cores : abrupt climate change and the prolonged Holocene

International Nuclear Information System (INIS)

White, J.W.C.

2001-01-01

Ice cores provide valuable information about the Earth's past climates and past environments. They can also help in predicting future climates and the nature of climate change. Recent findings in ice cores have shown large and abrupt climate changes in the past. This paper addressed abrupt climate changes and the peculiar nature of the Holocene. An abrupt climate change is a shift of 5 degrees C in mean annual temperature in less than 50 years. This is considered to be the most threatening aspect of potential future climate change since it leaves very little time for adaptation by humans or any other part of the Earth's ecosystem. This paper also discussed the arrival of the next glacial period. In the past 50 years, scientists have recognized the importance of the Earth's orbit around the sun in pacing the occurrence of large ice sheets. The timing of orbital forcing suggests that the Earth is overdue for the next major glaciation. The reason for this anomaly was discussed. Abrupt climate shifts seem to be caused by mode changes in sensitive points in the climate system, such as the North Atlantic Deep Water Formation and its impact on sea ice cover in the North Atlantic. These changes have been observed in ice cores in Greenland but they are not restricted to Greenland. Evidence from Antarctic ice cores suggest that abrupt climate change may also occur in the Southern Hemisphere. The Vostok ice core in Antarctica indicates that the 11,000 year long interglacial period that we are in right now is longer than the previous four interglacial periods. The Holocene epoch is unique because both methane and carbon dioxide rise in the last 6,000 years, an atypical response from these greenhouse gases during an interglacial period. It was suggested that the rise in methane can be attributed to human activities. 13 refs., 2 figs
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

Science.gov (United States)

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Earth's Climate History from Glaciers and Ice Cores

Science.gov (United States)

Thompson, Lonnie

2013-03-01

Glaciers serve both as recorders and early indicators of climate change. Over the past 35 years our research team has recovered climatic and environmental histories from ice cores drilled in both Polar Regions and from low to mid-latitude, high-elevation ice fields. Those ice core -derived proxy records extending back 25,000 years have made it possible to compare glacial stage conditions in the Tropics with those in the Polar Regions. High-resolution records of δ18O (in part a temperature proxy) demonstrate that the current warming at high elevations in the mid- to lower latitudes is unprecedented for the last two millennia, although at many sites the early Holocene was warmer than today. Remarkable similarities between changes in the highland and coastal cultures of Peru and regional climate variability, especially precipitation, imply a strong connection between prehistoric human activities and regional climate. Ice cores retrieved from shrinking glaciers around the world confirm their continuous existence for periods ranging from hundreds to thousands of years, suggesting that current climatological conditions in those regions today are different from those under which these ice fields originated and have been sustained. The ongoing widespread melting of high-elevation glaciers and ice caps, particularly in low to middle latitudes, provides strong evidence that a large-scale, pervasive and, in some cases, rapid change in Earth's climate system is underway. Observations of glacier shrinkage during the 20th and 21st century girdle the globe from the South American Andes, the Himalayas, Kilimanjaro (Tanzania, Africa) and glaciers near Puncak Jaya, Indonesia (New Guinea). The history and fate of these ice caps, told through the adventure, beauty and the scientific evidence from some of world's most remote mountain tops, provide a global perspective for contemporary climate. NSF Paleoclimate Program
Development of a parallel genetic algorithm using MPI and its application in a nuclear reactor core. Design optimization

International Nuclear Information System (INIS)

Waintraub, Marcel; Pereira, Claudio M.N.A.; Baptista, Rafael P.

2005-01-01

This work presents the development of a distributed parallel genetic algorithm applied to a nuclear reactor core design optimization. In the implementation of the parallelism, a 'Message Passing Interface' (MPI) library, standard for parallel computation in distributed memory platforms, has been used. Another important characteristic of MPI is its portability for various architectures. The main objectives of this paper are: validation of the results obtained by the application of this algorithm in a nuclear reactor core optimization problem, through comparisons with previous results presented by Pereira et al.; and performance test of the Brazilian Nuclear Engineering Institute (IEN) cluster in reactors physics optimization problems. The experiments demonstrated that the developed parallel genetic algorithm using the MPI library presented significant gains in the obtained results and an accentuated reduction of the processing time. Such results ratify the use of the parallel genetic algorithms for the solution of nuclear reactor core optimization problems. (author)
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

OpenAIRE

Albutiu, Martina-Cezara; Kemper, Alfons; Neumann, Thomas

2012-01-01

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel mult...
Parallel Access of Out-Of-Core Dense Extendible Arrays

Energy Technology Data Exchange (ETDEWEB)

Otoo, Ekow J; Rotem, Doron

2007-07-26

Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. Wepresent a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements.
Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

Directory of Open Access Journals (Sweden)

Alexander B. Bakulev

2012-11-01

Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.
Parallel community climate model: Description and user`s guide

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

1996-07-15

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.
Climate change from air in ice cores

International Nuclear Information System (INIS)

Riedel, K.

2013-01-01

How sensitive is our climate to greenhouse gas concentrations? What feedbacks will trigger further emissions in a warming world and at which thresholds? Over the last 200 years human activity has increased greenhouse gases to well beyond the natural range for the last 800,000 years. In order to mitigate changes - or adapt to them - we need a better understanding of greenhouse gas sources and sinks in the recent past. Ice cores with occluded ancient air hold the key to understanding the linkages between climate change and greenhouse gas variations. (author). 22 refs., 1 tab.
Ice core melt features in relation to Antarctic coastal climate

NARCIS (Netherlands)

Kaczmarska, M.; Isaksson, E.; Karlöf, L.; Brandt, O.; Winther, J.G.; van de Wal, R.S.W.; van den Broeke, M.R.; Johnsen, S.J.

2006-01-01

Measurement of light intensity transmission was carried out on an ice core S100 from coastal Dronning Maud Land (DML). Ice lenses were observed in digital pictures of the core and recorded as peaks in the light transmittance record. The frequency of ice layer occurrence was compared with climate
Performance modeling and analysis of parallel Gaussian elimination on multi-core computers

Directory of Open Access Journals (Sweden)

Fadi N. Sibai

2014-01-01

Full Text Available Gaussian elimination is used in many applications and in particular in the solution of systems of linear equations. This paper presents mathematical performance models and analysis of four parallel Gaussian Elimination methods (precisely the Original method and the new Meet in the Middle –MiM– algorithms and their variants with SIMD vectorization on multi-core systems. Analytical performance models of the four methods are formulated and presented followed by evaluations of these models with modern multi-core systems’ operation latencies. Our results reveal that the four methods generally exhibit good performance scaling with increasing matrix size and number of cores. SIMD vectorization only makes a large difference in performance for low number of cores. For a large matrix size (n ⩾ 16 K, the performance difference between the MiM and Original methods falls from 16× with four cores to 4× with 16 K cores. The efficiencies of all four methods are low with 1 K cores or more stressing a major problem of multi-core systems where the network-on-chip and memory latencies are too high in relation to basic arithmetic operations. Thus Gaussian Elimination can greatly benefit from the resources of multi-core systems, but higher performance gains can be achieved if multi-core systems can be designed with lower memory operation, synchronization, and interconnect communication latencies, requirements of utmost importance and challenge in the exascale computing age.
Cache Locality-Centric Parallel String Matching on Many-Core Accelerator Chips

OpenAIRE

Tran, Nhat-Phuong; Lee, Myungho; Choi, Dong Hoon

2015-01-01

Aho-Corasick (AC) algorithm is a multiple patterns string matching algorithm commonly used in computer and network security and bioinformatics, among many others. In order to meet the highly demanding computational requirements imposed on these applications, achieving high performance for the AC algorithm is crucial. In this paper, we present a high performance parallelization of the AC on the many-core accelerator chips such as the Graphic Processing Unit (GPU) from Nvidia and...
Evidence for general instability of past climate from a 250-KYR ice-core record

DEFF Research Database (Denmark)

Johnsen, Sigfus Johann; Clausen, Henrik Brink; Dahl-Jensen, Dorthe

1993-01-01

decades. Here we present a detailed stable-isotope record for the full length of the Greenland Ice-core Project Summit ice core, extending over the past 250 kyr according to a calculated timescale. We find that climate instability was not confined to the last glaciation, but appears also to have been...... results1,2 from two ice cores drilled in central Greenland have revealed large, abrupt climate changes of at least regional extent during the late stages of the last glaciation, suggesting that climate in the North Atlantic region is able to reorganize itself rapidly, perhaps even within a few...
Climate models on massively parallel computers

International Nuclear Information System (INIS)

Vitart, F.; Rouvillois, P.

1993-01-01

First results got on massively parallel computers (Multiple Instruction Multiple Data and Simple Instruction Multiple Data) allow to consider building of coupled models with high resolutions. This would make possible simulation of thermoaline circulation and other interaction phenomena between atmosphere and ocean. The increasing of computers powers, and then the improvement of resolution will go us to revise our approximations. Then hydrostatic approximation (in ocean circulation) will not be valid when the grid mesh will be of a dimension lower than a few kilometers: We shall have to find other models. The expert appraisement got in numerical analysis at the Center of Limeil-Valenton (CEL-V) will be used again to imagine global models taking in account atmosphere, ocean, ice floe and biosphere, allowing climate simulation until a regional scale
Operating system design of parallel computer for on-line management of nuclear pressurised water reactor cores

International Nuclear Information System (INIS)

Gougam, F.

1991-04-01

This study is part of the PHAETON project which aims at increasing the knowledge of safety parameters of PWR core and reducing operating margins during the reactor cycle. The on-line system associates a simulator process to compute the three dimensional flux distribution and an acquisition process of reactor core parameters from the central instrumentation. The 3D flux calculation is the most time consuming. So, for cost and safety reasons, the PHAETON project proposes an approach which is to parallelize the 3D diffusion calculation and to use a computer based on parallel processor architecture. This paper presents the design of the operating system on which the application is executed. The routine interface proposed, includes the main operations necessary for programming a real time and parallel application. The primitives include: task management, data transfer, synchronisation by event signalling and by using the rendez-vous mechanisms. The primitives which are proposed use standard softwares like real-time kernel and UNIX operating system [fr
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

Directory of Open Access Journals (Sweden)

Vaughn Matthew

2010-11-01

Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

Science.gov (United States)

Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

2010-11-15

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for

Optimization and Openmp Parallelization of a Discrete Element Code for Convex Polyhedra on Multi-Core Machines

Science.gov (United States)

Chen, Jian; Matuttis, Hans-Georg

2013-02-01

We report our experiences with the optimization and parallelization of a discrete element code for convex polyhedra on multi-core machines and introduce a novel variant of the sort-and-sweep neighborhood algorithm. While in theory the whole code in itself parallelizes ideally, in practice the results on different architectures with different compilers and performance measurement tools depend very much on the particle number and optimization of the code. After difficulties with the interpretation of the data for speedup and efficiency are overcome, respectable parallelization speedups could be obtained.
Optimization of multi-phase compressible lattice Boltzmann codes on massively parallel multi-core systems

NARCIS (Netherlands)

Biferale, L.; Mantovani, F.; Pivanti, M.; Pozzati, F.; Sbragaglia, M.; Schifano, S.F.; Toschi, F.; Tripiccione, R.

2011-01-01

We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively parallel systems based on multi-core processors. Our code describes 2D multi-phase compressible flows. We analyze the performance bottlenecks that we find as we gradually expose a larger fraction of
Badlands: A parallel basin and landscape dynamics model

Directory of Open Access Journals (Sweden)

T. Salles

2016-01-01

Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.
The design of multi-core DSP parallel model based on message passing and multi-level pipeline

Science.gov (United States)

Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

2017-10-01

Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
A parallel solution-adaptive scheme for predicting multi-phase core flows in solid propellant rocket motors

International Nuclear Information System (INIS)

Sachdev, J.S.; Groth, C.P.T.; Gottlieb, J.J.

2003-01-01

The development of a parallel adaptive mesh refinement (AMR) scheme is described for solving the governing equations for multi-phase (gas-particle) core flows in solid propellant rocket motors (SRM). An Eulerian formulation is used to described the coupled motion between the gas and particle phases. A cell-centred upwind finite-volume discretization and the use of limited solution reconstruction, Riemann solver based flux functions for the gas and particle phases, and explicit multi-stage time-stepping allows for high solution accuracy and computational robustness. A Riemann problem is formulated for prescribing boundary data at the burning surface. Efficient and scalable parallel implementations are achieved with domain decomposition on distributed memory multiprocessor architectures. Numerical results are described to demonstrate the capabilities of the approach for predicting SRM core flows. (author)
Climatic changes on orbital and sub-orbital time scale recorded by the Guliya ice core in Tibetan Plateau

Institute of Scientific and Technical Information of China (English)

姚檀栋; 徐柏青; 蒲健辰

2001-01-01

Based on ice core records in the Tibetan Plateau and Greenland, the features and possible causes of climatic changes on orbital and sub-orbital time scale were discussed. Orbital time scale climatic change recorded in ice core from the Tibetan Plateau is typically ahead of that from polar regions, which indicates that climatic change in the Tibetan Plateau might be earlier than polar regions. The solar radiation change is a major factor that dominates the climatic change on orbital time scale. However, climatic events on sub-orbital time scale occurred later in the Tibetan Plateau than in the Arctic Region, indicating a different mechanism. For example, the Younger Dryas and Heinrich events took place earlier in Greenland ice core record than in Guliya ice core record. It is reasonable to propose the hypothesis that these climatic events were affected possibly by the Laurentide Ice Sheet. Therefore, ice sheet is critically important to climatic change on sub-orbital time scale in some ice ages.
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

OpenAIRE

Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann

2012-01-01

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research ...
Long memory effect of past climate change in Vostok ice core records

International Nuclear Information System (INIS)

Yamamoto, Yuuki; Kitahara, Naoki; Kano, Makoto

2012-01-01

Time series analysis of Vostok ice core data has been done for understanding of palaeoclimate change from a stochastic perspective. The Vostok ice core is one of the proxy data for palaeoclimate in which local temperature and precipitation rate, moisture source conditions, wind strength and aerosol fluxes of marine, volcanic, terrestrial, cosmogenic and anthropogenic origin are indirectly stored. Palaeoclimate data has a periodic feature and a stochastic feature. For the proxy data, spectrum analysis and detrended fluctuation analysis (DFA) were conducted to characterize periodicity and scaling property (long memory effect) in the climate change. The result of spectrum analysis indicates there exist periodicities corresponding to the Milankovitch cycle in past climate change occurred. DFA clarified time variability of scaling exponents (Hurst exponent) is associated with abrupt warming in past climate.
Past temperature reconstructions from deep ice cores: relevance for future climate change

Directory of Open Access Journals (Sweden)

V. Masson-Delmotte

2006-01-01

Full Text Available Ice cores provide unique archives of past climate and environmental changes based only on physical processes. Quantitative temperature reconstructions are essential for the comparison between ice core records and climate models. We give an overview of the methods that have been developed to reconstruct past local temperatures from deep ice cores and highlight several points that are relevant for future climate change. We first analyse the long term fluctuations of temperature as depicted in the long Antarctic record from EPICA Dome C. The long term imprint of obliquity changes in the EPICA Dome C record is highlighted and compared to simulations conducted with the ECBILT-CLIO intermediate complexity climate model. We discuss the comparison between the current interglacial period and the long interglacial corresponding to marine isotopic stage 11, ~400 kyr BP. Previous studies had focused on the role of precession and the thresholds required to induce glacial inceptions. We suggest that, due to the low eccentricity configuration of MIS 11 and the Holocene, the effect of precession on the incoming solar radiation is damped and that changes in obliquity must be taken into account. The EPICA Dome C alignment of terminations I and VI published in 2004 corresponds to a phasing of the obliquity signals. A conjunction of low obliquity and minimum northern hemisphere summer insolation is not found in the next tens of thousand years, supporting the idea of an unusually long interglacial ahead. As a second point relevant for future climate change, we discuss the magnitude and rate of change of past temperatures reconstructed from Greenland (NorthGRIP and Antarctic (Dome C ice cores. Past episodes of temperatures above the present-day values by up to 5°C are recorded at both locations during the penultimate interglacial period. The rate of polar warming simulated by coupled climate models forced by a CO2 increase of 1% per year is compared to ice-core
Cpl6: The New Extensible, High-Performance Parallel Coupler forthe Community Climate System Model

Energy Technology Data Exchange (ETDEWEB)

Craig, Anthony P.; Jacob, Robert L.; Kauffman, Brain; Bettge,Tom; Larson, Jay; Ong, Everest; Ding, Chris; He, Yun

2005-03-24

Coupled climate models are large, multiphysics applications designed to simulate the Earth's climate and predict the response of the climate to any changes in the forcing or boundary conditions. The Community Climate System Model (CCSM) is a widely used state-of-art climate model that has released several versions to the climate community over the past ten years. Like many climate models, CCSM employs a coupler, a functional unit that coordinates the exchange of data between parts of climate system such as the atmosphere and ocean. This paper describes the new coupler, cpl6, contained in the latest version of CCSM,CCSM3. Cpl6 introduces distributed-memory parallelism to the coupler, a class library for important coupler functions, and a standardized interface for component models. Cpl6 is implemented entirely in Fortran90 and uses Model Coupling Toolkit as the base for most of its classes. Cpl6 gives improved performance over previous versions and scales well on multiple platforms.
Possible origin and significance of extension-parallel drainages in Arizona's metamophic core complexes

Science.gov (United States)

Spencer, J.E.

2000-01-01

The corrugated form of the Harcuvar, South Mountains, and Catalina metamorphic core complexes in Arizona reflects the shape of the middle Tertiary extensional detachment fault that projects over each complex. Corrugation axes are approximately parallel to the fault-displacement direction and to the footwall mylonitic lineation. The core complexes are locally incised by enigmatic, linear drainages that parallel corrugation axes and the inferred extension direction and are especially conspicuous on the crests of antiformal corrugations. These drainages have been attributed to erosional incision on a freshly denuded, planar, inclined fault ramp followed by folding that elevated and preserved some drainages on the crests of rising antiforms. According to this hypothesis, corrugations were produced by folding after subacrial exposure of detachment-fault foot-walls. An alternative hypothesis, proposed here, is as follows. In a setting where preexisting drainages cross an active normal fault, each fault-slip event will cut each drainage into two segments separated by a freshly denuded fault ramp. The upper and lower drainage segments will remain hydraulically linked after each fault-slip event if the drainage in the hanging-wall block is incised, even if the stream is on the flank of an antiformal corrugation and there is a large component of strike-slip fault movement. Maintenance of hydraulic linkage during sequential fault-slip events will guide the lengthening stream down the fault ramp as the ramp is uncovered, and stream incision will form a progressively lengthening, extension-parallel, linear drainage segment. This mechanism for linear drainage genesis is compatible with corrugations as original irregularities of the detachment fault, and does not require folding after early to middle Miocene footwall exhumations. This is desirable because many drainages are incised into nonmylonitic crystalline footwall rocks that were probably not folded under low
Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors

OpenAIRE

Jiang, Hayang; Xie, Gaogang; Salamatian, Kavé; Mathy, Laurent

2013-01-01

Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. ...
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

Energy Technology Data Exchange (ETDEWEB)

Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U., Phys. Dept.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U., Phys. Dept.; Riley, Daniel [Cornell U., Phys. Dept.; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U., Phys. Dept.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

2017-11-16

Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs, and similar technologies. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.
The ice-core record - Climate sensitivity and future greenhouse warming

Science.gov (United States)

Lorius, C.; Raynaud, D.; Jouzel, J.; Hansen, J.; Le Treut, H.

1990-01-01

The prediction of future greenhouse-gas-warming depends critically on the sensitivity of earth's climate to increasing atmospheric concentrations of these gases. Data from cores drilled in polar ice sheets show a remarkable correlation between past glacial-interglacial temperature changes and the inferred atmospheric concentration of gases such as carbon dioxide and methane. These and other palaeoclimate data are used to assess the role of greenhouse gases in explaining past global climate change, and the validity of models predicting the effect of increasing concentrations of such gases in the atmosphere.
Direct north-south synchronization of abrupt climate change record in ice cores using Beryllium 10

Directory of Open Access Journals (Sweden)

G. M. Raisbeck

2007-09-01

Full Text Available A new, decadally resolved record of the ¹⁰Be peak at 41 kyr from the EPICA Dome C ice core (Antarctica is used to match it with the same peak in the GRIP ice core (Greenland. This permits a direct synchronisation of the climatic variations around this time period, independent of uncertainties related to the ice age-gas age difference in ice cores. Dansgaard-Oeschger event 10 is in the period of best synchronisation and is found to be coeval with an Antarctic temperature maximum. Simulations using a thermal bipolar seesaw model agree reasonably well with the observed relative climate chronology in these two cores. They also reproduce three Antarctic warming events observed between A1 and A2.
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Benchmarking NWP Kernels on Multi- and Many-core Processors

Science.gov (United States)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Hominin Sites and Paleolakes Drilling Project. Chew Bahir, southern Ethiopia: How to get from three tonnes of sediment core to > 500 ka of continuous climate history?

Science.gov (United States)

Foerster, Verena; Asrat, Asfawossen; Cohen, Andrew S.; Gromig, Raphael; Günter, Christina; Junginger, Annett; Lamb, Henry F.; Schaebitz, Frank; Trauth, Martin H.

2016-04-01

In search of the environmental context of the evolution and dispersal of Homo sapiens and our close relatives within and beyond the African continent, the ICDP-funded Hominin Sites and Paleolakes Drilling Project (HSPDP) has recently cored five fluvio-lacustrine archives of climate change in East Africa. The sediment cores collected in Ethiopia and Kenya are expected to provide valuable insights into East African environmental variability during the last ~3.5 Ma. The tectonically-bound Chew Bahir basin in the southern Ethiopian rift is one of the five sites within HSPDP, located in close proximity to the Lower Omo River valley, the site of the oldest known fossils of anatomically modern humans. In late 2014, the two cores (279 and 266 m long respectively, HSPDP-CHB14-2A and 2B) were recovered, summing up to nearly three tonnes of mostly calcareous clays and silts. Deciphering an environmental record from multiple records, from the source region of modern humans could eventually allow us to reconstruct the pronounced variations of moisture availability during the transition into Middle Stone Age, and its implications for the origin and dispersal of Homo sapiens. Here we present the first results of our analysis of the Chew Bahir cores. Following the HSPDP protocols, the two parallel Chew Bahir sediment cores have been merged into one single, 280 m long and nearly continuous (>90%) composite core on the basis of a high resolution MSCL data set (e.g., magnetic susceptibility, gamma ray density, color intensity transects, core photographs). Based on the obvious cyclicities in the MSCL, correlated with orbital cycles, the time interval covered by our sediment archive of climate change is inferred to span the last 500-600 kyrs. Combining our first results from the long cores with the results from the accomplished pre-study of short cores taken in 2009/10 along a NW-SE transect across the basin (Foerster et al., 2012, Trauth et al., 2015), we have developed a hypothesis
Coarse-grained parallel genetic algorithm applied to a nuclear reactor core design optimization problem

International Nuclear Information System (INIS)

Pereira, Claudio M.N.A.; Lapa, Celso M.F.

2003-01-01

This work extends the research related to generic algorithms (GA) in core design optimization problems, which basic investigations were presented in previous work. Here we explore the use of the Island Genetic Algorithm (IGA), a coarse-grained parallel GA model, comparing its performance to that obtained by the application of a traditional non-parallel GA. The optimization problem consists on adjusting several reactor cell parameters, such as dimensions, enrichment and materials, in order to minimize the average peak-factor in a 3-enrichment zone reactor, considering restrictions on the average thermal flux, criticality and sub-moderation. Our IGA implementation runs as a distributed application on a conventional local area network (LAN), avoiding the use of expensive parallel computers or architectures. After exhaustive experiments, taking more than 1500 h in 550 MHz personal computers, we have observed that the IGA provided gains not only in terms of computational time, but also in the optimization outcome. Besides, we have also realized that, for such kind of problem, which fitness evaluation is itself time consuming, the time overhead in the IGA, due to the communication in LANs, is practically imperceptible, leading to the conclusion that the use of expensive parallel computers or architecture can be avoided
ISP: an optimal out-of-core image-set processing streaming architecture for parallel heterogeneous systems.

Science.gov (United States)

Ha, Linh Khanh; Krüger, Jens; Dihl Comba, João Luiz; Silva, Cláudio T; Joshi, Sarang

2012-06-01

Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.

Impact of climate fluctuations on deposition of DDT and hexachlorocyclohexane in mountain glaciers: Evidence from ice core records

International Nuclear Information System (INIS)

Wang Xiaoping; Gong Ping; Zhang, Qianggong; Yao Tandong

2010-01-01

How do climate fluctuations affect DDT and hexachlorocyclohexane (HCH) distribution in the global scale? In this study, the interactions between climate variations and depositions of DDT and HCH in ice cores from Mt. Everest (the Tibetan Plateau), Mt. Muztagata (the eastern Pamirs) and the Rocky Mountains were investigated. All data regarding DDT/HCH deposition were obtained from the published results. Concentrations of DDT and HCH in an ice core from Mt. Everest were associated with the El Nino-Southern Oscillation. Concentrations of DDT in an ice core from Mt. Muztagata were significantly correlated with the Siberia High pattern. Concentrations of HCH in an ice core from Snow Dome of the Rocky Mountains responded to the North Atlantic Oscillation. These associations suggested that there are some linkages between climate variations and the global distribution of persistent organic pollutants. - Our study approves the potential contribution of ice core records of POPs to transport mechanisms of POPs.
Reactor core T-H characteristics determination in case of parallel operation of different fuel assembly types

International Nuclear Information System (INIS)

Hermansky, J.; Petenyi, V.; Zavodsky, M.

2009-01-01

The WWER-440 nuclear fuel vendor permanently improve the assortment of produced nuclear fuel assemblies for achieving better fuel cycle economy and reactor operation safety. Therefore it is necessary to have the skilled methodology and computing code for analyzing factors which affecting the accuracy of flow redistributed determination through reactor on flows through separate parts of reactor core in case of parallel operation different assembly types. Whereas the geometric parameters of new manufactured assemblies were changed recently, the calculated flows through the fuel parts of different type of assemblies are depended also on their real position in reactor core. Therefore the computing code CORFLO was developed in VUJE Trnava for carrying out stationary analyses of T-H characteristics of reactor core within 60 deg symmetry. The CORFLO code deals the area of the active core which consists of 312 fuel assemblies and 37 control assemblies. Regarding the rotational 60 deg symmetry of reactor core only 1/6 of reactor core with 59 fuel assemblies is calculated. Computing code is verified and validated at this time. Paper presents the short description of computing code CORFLO with some calculated results. (Authors)
Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Accelerating Climate Simulations Through Hybrid Computing

Science.gov (United States)

Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

2009-01-01

Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
RICE ice core: Black Carbon reflects climate variability at Roosevelt Island, West Antarctica

Science.gov (United States)

Ellis, Aja; Edwards, Ross; Bertler, Nancy; Winton, Holly; Goodwin, Ian; Neff, Peter; Tuohy, Andrea; Proemse, Bernadette; Hogan, Chad; Feiteng, Wang

2015-04-01

The Roosevelt Island Climate Evolution (RICE) project successfully drilled a deep ice core from Roosevelt Island during the 2011/2012 and 2012/2013 seasons. Located in the Ross Ice Shelf in West Antarctica, the site is an ideal location for investigating climate variability and the past stability of the Ross Ice Shelf. Black carbon (BC) aerosols are emitted by both biomass burning and fossil fuels, and BC particles emitted in the southern hemisphere are transported in the atmosphere and preserved in Antarctic ice. The past record of BC is expected to be sensitive to climate variability, as it is modulated by both emissions and transport. To investigate BC variability over the past 200 years, we developed a BC record from two overlapping ice cores (~1850-2012) and a high-resolution snow pit spanning 2010-2012 (cal. yr). Consistent results are found between the snow pit profiles and ice core records. Distinct decadal trends are found with respect to BC particle size, and the record indicates a steady rise in BC particle size over the last 100 years. Differences in emission sources and conditions may be a possible explanation for changes in BC size. These records also show a significant increase in BC concentration over the past decade with concentrations rising over 1.5 ppb (1.5*10^-9 ng/g), suggesting a fundamental shift in BC deposition to the site.
Par@Graph - a parallel toolbox for the construction and analysis of large complex climate networks

NARCIS (Netherlands)

Tantet, A.J.J.

2015-01-01

In this paper, we present Par@Graph, a software toolbox to reconstruct and analyze complex climate networks having a large number of nodes (up to at least 106) and edges (up to at least 1012). The key innovation is an efficient set of parallel software tools designed to leverage the inherited hybrid
Reassessment of the Upper Fremont Glacier ice-core chronologies by synchronizing of ice-core-water isotopes to a nearby tree-ring chronology

Science.gov (United States)

Chellman, Nathan J.; McConnell, Joseph R.; Arienzo, Monica; Pederson, Gregory T.; Aarons, Sarah; Csank, Adam

2017-01-01

The Upper Fremont Glacier (UFG), Wyoming, is one of the few continental glaciers in the contiguous United States known to preserve environmental and climate records spanning recent centuries. A pair of ice cores taken from UFG have been studied extensively to document changes in climate and industrial pollution (most notably, mid-19th century increases in mercury pollution). Fundamental to these studies is the chronology used to map ice-core depth to age. Here, we present a revised chronology for the UFG ice cores based on new measurements and using a novel dating approach of synchronizing continuous water isotope measurements to a nearby tree-ring chronology. While consistent with the few unambiguous age controls underpinning the previous UFG chronologies, the new interpretation suggests a very different time scale for the UFG cores with changes of up to 80 years. Mercury increases previously associated with the mid-19th century Gold Rush now coincide with early-20th century industrial emissions, aligning the UFG record with other North American mercury records from ice and lake sediment cores. Additionally, new UFG records of industrial pollutants parallel changes documented in ice cores from southern Greenland, further validating the new UFG chronologies while documenting the extent of late 19th and early 20th century pollution in remote North America.
New perspectives for European climate services: HORIZON2020

Science.gov (United States)

Bruning, Claus; Tilche, Andrea

2014-05-01

The developing of new end-to-end climate services was one of the core priorities of 7th Framework for Research and Technological Development of the European Commission and will become one of the key strategic priorities of Societal Challenge 5 of HORIZON2020 (the new EU Framework Programme for Research and Innovation 2014-2020). Results should increase the competitiveness of European businesses, and the ability of regional and national authorities to make effective decisions in climate-sensitive sectors. In parallel, the production of new tailored climate information should strengthen the resilience of the European society to climate change. In this perspective the strategy to support and foster the underpinning science for climate services in HORIZON2020 will be presented.
A solution for automatic parallelization of sequential assembly code

Directory of Open Access Journals (Sweden)

Kovačević Đorđe

2013-01-01

Full Text Available Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.
Waves in the core and mechanical core-mantle interactions

DEFF Research Database (Denmark)

Jault, D.; Finlay, Chris

2015-01-01

This Chapter focuses on time-dependent uid motions in the core interior, which can beconstrained by observations of the Earth's magnetic eld, on timescales which are shortcompared to the magnetic diusion time. This dynamics is strongly inuenced by the Earth's rapid rotation, which rigidies...... the motions in the direction parallel to the Earth'srotation axis. This property accounts for the signicance of the core-mantle topography.In addition, the stiening of the uid in the direction parallel to the rotation axis gives riseto a magnetic diusion layer attached to the core-mantle boundary, which would...... otherwisebe dispersed by Alfven waves. This Chapter complements the descriptions of large-scaleow in the core (8.04), of turbulence in the core (8.06) and of core-mantle interactions(8.12), which can all be found in this volume. We rely on basic magnetohydrodynamictheory, including the derivation...
Parallelizing the spectral transform method: A comparison of alternative parallel algorithms

International Nuclear Information System (INIS)

Foster, I.; Worley, P.H.

1993-01-01

The spectral transform method is a standard numerical technique for solving partial differential equations on the sphere and is widely used in global climate modeling. In this paper, we outline different approaches to parallelizing the method and describe experiments that we are conducting to evaluate the efficiency of these approaches on parallel computers. The experiments are conducted using a testbed code that solves the nonlinear shallow water equations on a sphere, but are designed to permit evaluation in the context of a global model. They allow us to evaluate the relative merits of the approaches as a function of problem size and number of processors. The results of this study are guiding ongoing work on PCCM2, a parallel implementation of the Community Climate Model developed at the National Center for Atmospheric Research
Automatic Loop Parallelization via Compiler Guided Refactoring

DEFF Research Database (Denmark)

Larsen, Per; Ladelsky, Razya; Lidman, Jacob

For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...
Annually resolved ice core records of tropical climate variability over the past ~1800 years.

Science.gov (United States)

Thompson, L G; Mosley-Thompson, E; Davis, M E; Zagorodnov, V S; Howat, I M; Mikhalenko, V N; Lin, P-N

2013-05-24

Ice cores from low latitudes can provide a wealth of unique information about past climate in the tropics, but they are difficult to recover and few exist. Here, we report annually resolved ice core records from the Quelccaya ice cap (5670 meters above sea level) in Peru that extend back ~1800 years and provide a high-resolution record of climate variability there. Oxygen isotopic ratios (δ(18)O) are linked to sea surface temperatures in the tropical eastern Pacific, whereas concentrations of ammonium and nitrate document the dominant role played by the migration of the Intertropical Convergence Zone in the region of the tropical Andes. Quelccaya continues to retreat and thin. Radiocarbon dates on wetland plants exposed along its retreating margins indicate that it has not been smaller for at least six millennia.
A hybrid algorithm for parallel molecular dynamics simulations

Science.gov (United States)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Rubus: A compiler for seamless and extensible parallelism

Science.gov (United States)

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been
Rubus: A compiler for seamless and extensible parallelism.

Directory of Open Access Journals (Sweden)

Muhammad Adnan

Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84
Synthesis of parallel and antiparallel core-shell triangular nanoparticles

Science.gov (United States)

Bhattacharjee, Gourab; Satpati, Biswarup

2018-04-01

Core-shell triangular nanoparticles were synthesized by seed mediated growth. Using triangular gold (Au) nanoparticle as template, we have grown silver (Ag) shellto get core-shell nanoparticle. Here by changing the chemistry we have grown two types of core-shell structures where core and shell is having same symmetry and also having opposite symmetry. Both core and core-shell nanoparticles were characterized using transmission electron microscopy (TEM) and energy dispersive X-ray spectroscopy (EDX) to know the crystal structure and composition of these synthesized core-shell nanoparticles. From diffraction pattern analysis and energy filtered TEM (EFTEM) we have confirmed the crystal facet in core is responsible for such two dimensional growth of core-shell nanostructures.
A theoretical concept for a thermal-hydraulic 3D parallel channel core model

International Nuclear Information System (INIS)

Hoeld, A.

2004-01-01

A detailed description of the theoretical concept of the 3D thermal-hydraulic single- and two-phase flow phenomena is presented. The theoretical concept is based on important development lines such as separate treatment of the mass and energy from the momentum balance eqs. The other line is the establishment of a procedure for the calculation of the mass flow distributions into different parallel channels based on the fact that the sum of pressure decrease terms over a closed loop must stay, despite of un-symmetric perturbations, zero. The concept is realized in the experimental code HERO-X3D, concentrating in a first step on an artificial BWR or PWR core which may consist of a central channel, four quadrants, and a bypass channel. (authors)
Parallelization of Subchannel Analysis Code MATRA

International Nuclear Information System (INIS)

Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk

2014-01-01

A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
How to use MPI communication in highly parallel climate simulations more easily and more efficiently.

Science.gov (United States)

Behrens, Jörg; Hanke, Moritz; Jahns, Thomas

2014-05-01

In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.
Improvement of Parallel Algorithm for MATRA Code

International Nuclear Information System (INIS)

Kim, Seong-Jin; Seo, Kyong-Won; Kwon, Hyouk; Hwang, Dae-Hyun

2014-01-01

The feasibility study to parallelize the MATRA code was conducted in KAERI early this year. As a result, a parallel algorithm for the MATRA code has been developed to decrease a considerably required computing time to solve a bigsize problem such as a whole core pin-by-pin problem of a general PWR reactor and to improve an overall performance of the multi-physics coupling calculations. It was shown that the performance of the MATRA code was greatly improved by implementing the parallel algorithm using MPI communication. For problems of a 1/8 core and whole core for SMART reactor, a speedup was evaluated as about 10 when the numbers of used processor were 25. However, it was also shown that the performance deteriorated as the axial node number increased. In this paper, the procedure of a communication between processors is optimized to improve the previous parallel algorithm.. To improve the performance deterioration of the parallelized MATRA code, the communication algorithm between processors was newly presented. It was shown that the speedup was improved and stable regardless of the axial node number
1500 Years of Annual Climate and Environmental Variability as Recorded in Bona-Churchill (Alaska) Ice Cores

Science.gov (United States)

Thompson, L. G.; Mosley-Thompson, E. S.; Zagorodnov, V.; Davis, M. E.; Mashiotta, T. A.; Lin, P.

2004-12-01

In 2003, six ice cores measuring 10.5, 11.5, 11.8, 12.4, 114 and 460 meters were recovered from the col between Mount Bona and Mount Churchill (61° 24'N; 141° 42'W; 4420 m asl). These cores have been analyzed for stable isotopic ratios, insoluble dust content and concentrations of major chemical species. Total Beta radioactivity was measured in the upper sections. The 460-meter core, extending to bedrock, captured the entire depositional record at this site where ice temperatures ranged from -24° C at 10 meters to -19.8° C at the ice/bedrock contact. The shallow cores allow assessment of surface processes under modern meteorological conditions while the deep core offers a ˜1500-year climate and environmental perspective. The average annual net balance is ˜~1000 mm of water equivalent and distinct annual signals in dust and calcium concentrations along with δ 18O allow annual resolution over most of the core. The excess sulfate record reflects many known large volcanic eruptions such as Katmai, Krakatau, Tambora, and Laki which allow validation of the time scale in the upper part of the core. The lower part of the core yields a history of earlier volcanic events. The 460-m Bona-Churchill ice core provides a detailed history of the `Little Ice Age' and medieval warm periods for southeastern Alaska. The source of the White River Ash will be discussed in light of the evidence from this core. The 460-m core also provides a long-term history of the dust fall that originates in north-central China. The annual ice core-derived climate records from southeastern Alaska will facilitate an investigation of the likelihood that the high resolution 1500-year record from the tropical Quelccaya Ice Cap (Peru) preserves a history of the variability of both the PDO and the Aleutian Low.
12-core x 3-mode Dense Space Division Multiplexed Transmission over 40 km Employing Multi-carrier Signals with Parallel MIMO Equalization

DEFF Research Database (Denmark)

Mizuno, T.; Kobayashi, T.; Takara, H.

2014-01-01

We demonstrate dense SDM transmission of 20-WDM multi-carrier PDM-32QAM signals over a 40-km 12-core x 3-mode fiber with 247.9-b/s/Hz spectral efficiency. Parallel MIMO equalization enables 21-ns DMD compensation with 61 TDE taps per subcarrier....
Parallelization of the model-based iterative reconstruction algorithm DIRA

International Nuclear Information System (INIS)

Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

2016-01-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
The Principalship: Essential Core Competencies for Instructional Leadership and Its Impact on School Climate

Science.gov (United States)

Ross, Dorrell J.; Cozzens, Jeffry A.

2016-01-01

The purpose of this quantitative study was to investigate teachers' perceptions of principals' leadership behaviors influencing the schools' climate according to Green's (2010) ideologies of the 13 core competencies within the four dimensions of principal leadership. Data from the "Leadership Behavior Inventory" (Green, 2014) suggest 314…
Climate Ocean Modeling on Parallel Computers

Science.gov (United States)

Wang, P.; Cheng, B. N.; Chao, Y.

1998-01-01

Ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change. However, modeling the ocean circulation at various spatial and temporal scales is a very challenging computational task.
Adapting algorithms to massively parallel hardware

CERN Document Server

Sioulas, Panagiotis

2016-01-01

In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Climate prosperity, parallel paths: Canada-U.S. climate policy choices

International Nuclear Information System (INIS)

2011-01-01

The National Round Table on the Environment and the Economy (NRTEE) has conducted a study on the economic risks and opportunities of climate policies for Canada in the Canada-United States relationship background. This research aims to inform future policy decisions and provide ideas on means to serve Canadian interests in the context of this climate changing world. The NRTEE's research presented in this report focuses on the economic and environmental implications for Canada of harmonizing with the U.S. on climate policy. The document follows two main objectives: evaluate the consequences of U.S. climate policy choices for Canada and identify the policy options that would lead to a long-term reduction of emissions, taking into account the economic risks associated to policy choices in the U.S. and in Canada. According to the NRTEE, the government of Canada should take under consideration the benefits of the implementation of its own strategy for harmonization. This document provides a path leading to the achievement of a climate policy harmonization with the United States. 52 refs., 18 tabs., 24 figs.
Performing a local reduction operation on a parallel computer

Science.gov (United States)

Blocksome, Michael A.; Faraj, Daniel A.

2012-12-11

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Options for Parallelizing a Planning and Scheduling Algorithm

Science.gov (United States)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

2011-01-01

Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Characterization of rapid climate changes through isotope analyses of ice and entrapped air in the NEEM ice core

DEFF Research Database (Denmark)

Guillevic, Myriam

Greenland ice core have revealed the occurrence of rapid climatic instabilities during the last glacial period, known as Dansgaard-Oeschger (DO) events, while marine cores from the North Atlantic have evidenced layers of ice rafted debris deposited by icebergs melt, caused by the collapse...... mechanisms at play. Recent analytical developments have made possible to measure new paleoclimate proxies in Greenland ice cores. In this thesis we first contribute to these analytical developments by measuring the new innovative parameter 17O-excess at LSCE (Laboratoire des Sciences du Climatet de l......'Environnement, France). At the Centre for Ice and Climate (CIC, Denmark) we contribute to the development of a protocol for absolute referencing of methane gas isotopes, and making full air standard with known concentration and isotopic composition of methane. Then, air (δ15N) and water stable isotope measurements from...
Accelerating Climate and Weather Simulations through Hybrid Computing

Science.gov (United States)

Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark

2011-01-01

Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.
Provably optimal parallel transport sweeps on regular grids

International Nuclear Information System (INIS)

Adams, M. P.; Adams, M. L.; Hawkins, W. D.; Smith, T.; Rauchwerger, L.; Amato, N. M.; Bailey, T. S.; Falgout, R. D.

2013-01-01

We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a 'proof that they always execute the full eight-octant sweep in the minimum possible number of stages for a given P x x P y x P z partitioning. Computational results demonstrate that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. An older version of our PDT transport code achieves almost 80% parallel efficiency on 131,072 cores, on a weak-scaling problem with only one energy group, 80 directions, and 4096 cells/core. A newer version is less efficient at present-we are still improving its implementation - but achieves almost 60% parallel efficiency on 393,216 cores. These results conclusively demonstrate that sweeps can perform with high efficiency on core counts approaching 10 6 . (authors)
Provably optimal parallel transport sweeps on regular grids

Energy Technology Data Exchange (ETDEWEB)

Adams, M. P.; Adams, M. L.; Hawkins, W. D. [Dept. of Nuclear Engineering, Texas A and M University, 3133 TAMU, College Station, TX 77843-3133 (United States); Smith, T.; Rauchwerger, L.; Amato, N. M. [Dept. of Computer Science and Engineering, Texas A and M University, 3133 TAMU, College Station, TX 77843-3133 (United States); Bailey, T. S.; Falgout, R. D. [Lawrence Livermore National Laboratory (United States)

2013-07-01

We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a 'proof that they always execute the full eight-octant sweep in the minimum possible number of stages for a given P{sub x} x P{sub y} x P{sub z} partitioning. Computational results demonstrate that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. An older version of our PDT transport code achieves almost 80% parallel efficiency on 131,072 cores, on a weak-scaling problem with only one energy group, 80 directions, and 4096 cells/core. A newer version is less efficient at present-we are still improving its implementation - but achieves almost 60% parallel efficiency on 393,216 cores. These results conclusively demonstrate that sweeps can perform with high efficiency on core counts approaching 10{sup 6}. (authors)
Efficient Out of Core Sorting Algorithms for the Parallel Disks Model.

Science.gov (United States)

Kundeti, Vamsi; Rajasekaran, Sanguthevar

2011-11-01

In this paper we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Secondly, we show analytically that this idea can reduce the number of parallel I/O's required to sort the input close to the lower bound of [Formula: see text]. We experimentally verify our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on PDM significantly.
Diatom Stratigraphy of FA-1 Core, Qarun Lake, Records of Holocene Environmental and Climatic Change in Faiyum Oasis, Egypt

Directory of Open Access Journals (Sweden)

Zalat Abdelfattah A.

2017-06-01

Full Text Available This study evaluates changes in the environmental and climatic conditions in the Faiyum Oasis during the Holocene based on diatom analyses of the sediment FA-1 core from the southern seashore of the Qarun Lake. The studied FA-1 core was 26 m long and covered the time span ca. 9.000 cal. yrs BP. Diatom taxa were abundant and moderately to well-preserved throughout the core sediments. Planktonic taxa were most abundant than the benthic and epiphytic forms, which were very rare and sparsely distributed. The most dominant planktonic genera were Aulacoseira and Stephanodiscus followed by frequently distribution of Cyclostephanos and Cyclotella species. The stratigraphic distribution patterns of the recorded diatoms through the Holocene sediments explained five ecological diatom groups. These groups represent distinctive environmental conditions, which were mainly related to climatic changes through the early and middle Holocene, in addition to anthropogenic activity during the late Holocene. Comparison of diatom assemblages in the studied sediment core suggests that considerable changes occurred in water level as well as salinity. There were several high stands of the freshwater lake level during humid, warmer-wet climatic phases marked by dominance of planktonic, oligohalobous and alkaliphilous diatoms alternated with lowering of the lake level and slight increases in salinity and alkalinity during warm arid conditions evident by prevalence of brackish water diatoms.
Trends in historical mercury deposition inferred from lake sediment cores across a climate gradient in the Canadian High Arctic.

Science.gov (United States)

Korosi, Jennifer B; Griffiths, Katherine; Smol, John P; Blais, Jules M

2018-06-02

Recent climate change may be enhancing mercury fluxes to Arctic lake sediments, confounding the use of sediment cores to reconstruct histories of atmospheric deposition. Assessing the independent effects of climate warming on mercury sequestration is challenging due to temporal overlap between warming temperatures and increased long-range transport of atmospheric mercury following the Industrial Revolution. We address this challenge by examining mercury trends in short cores (the last several hundred years) from eight lakes centered on Cape Herschel (Canadian High Arctic) that span a gradient in microclimates, including two lakes that have not yet been significantly altered by climate warming due to continued ice cover. Previous research on subfossil diatoms and inferred primary production indicated the timing of limnological responses to climate warming, which, due to prevailing ice cover conditions, varied from ∼1850 to ∼1990 for lakes that have undergone changes. We show that climate warming may have enhanced mercury deposition to lake sediments in one lake (Moraine Pond), while another (West Lake) showed a strong signal of post-industrial mercury enrichment without any corresponding limnological changes associated with warming. Our results provide insights into the role of climate warming and organic carbon cycling as drivers of mercury deposition to Arctic lake sediments. Copyright © 2018 Elsevier Ltd. All rights reserved.
Parallel plasma fluid turbulence calculations

International Nuclear Information System (INIS)

Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

1994-01-01

The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

The Chew Bahir Drilling Project (HSPDP). Deciphering climate information from the Chew Bahir sediment cores: Towards a continuous half-million year climate record near the Omo - Turkana key palaeonanthropological Site

Science.gov (United States)

Foerster, Verena E.; Asrat, Asfawossen; Chapot, Melissa S.; Cohen, Andrew S.; Dean, Jonathan R.; Deino, Alan; Günter, Christina; Junginger, Annett; Lamb, Henry F.; Leng, Melanie J.; Roberts, Helen M.; Schaebitz, Frank; Trauth, Martin H.

2017-04-01

As a contribution towards an enhanced understanding of human-climate interactions, the Hominin Sites and Paleolakes Drilling Project (HSPDP) has successfully completed coring five dominantly lacustrine archives of climate change during the last 3.5 Ma in East Africa. All five sites in Ethiopia and Kenya are adjacent to key paleoanthropological research areas encompassing diverse milestones in human evolution, dispersal episodes, and technological innovation. The 280 m-long Chew Bahir sediment records, recovered from a tectonically-bound basin in the southern Ethiopian rift in late 2014, cover the past 550 ka of environmental history, a time period that includes the transition to the Middle Stone Age, and the origin and dispersal of modern Homo sapiens. Deciphering climate information from lake sediments is challenging, due to the complex relationship between climate parameters and sediment composition. We will present the first results in our efforts to develop a reliable climate-proxy tool box for Chew Bahir by deconvolving the relationship between sedimentological and geochemical sediment composition and strongly climate-controlled processes in the basin, such as incongruent weathering, transportation and authigenic mineral alteration. Combining our first results from the long cores with those from a pilot study of short cores taken in 2009/10 along a NW-SE transect of the basin, we have developed a hypothesis linking climate forcing and paleoenvironmental signal-formation processes in the basin. X-ray diffraction analysis of the first sample sets from the long Chew Bahir record reveals similar processes that have been recognized for the uppermost 20 m during the pilot-study of the project: the diagenetic illitization of smectites during episodes of higher alkalinity and salinity in the closed-basin lake induced by a drier climate. The precise time resolution, largely continuous record and (eventually) a detailed understanding of site specific proxy formation
McCall Glacier record of Arctic climate change: Interpreting a northern Alaska ice core with regional water isotopes

Science.gov (United States)

Klein, E. S.; Nolan, M.; McConnell, J.; Sigl, M.; Cherry, J.; Young, J.; Welker, J. M.

2016-01-01

We explored modern precipitation and ice core isotope ratios to better understand both modern and paleo climate in the Arctic. Paleoclimate reconstructions require an understanding of how modern synoptic climate influences proxies used in those reconstructions, such as water isotopes. Therefore we measured periodic precipitation samples at Toolik Lake Field Station (Toolik) in the northern foothills of the Brooks Range in the Alaskan Arctic to determine δ18O and δ2H. We applied this multi-decadal local precipitation δ18O/temperature regression to ∼65 years of McCall Glacier (also in the Brooks Range) ice core isotope measurements and found an increase in reconstructed temperatures over the late-20th and early-21st centuries. We also show that the McCall Glacier δ18O isotope record is negatively correlated with the winter bidecadal North Pacific Index (NPI) climate oscillation. McCall Glacier deuterium excess (d-excess, δ2H - 8*δ18O) values display a bidecadal periodicity coherent with the NPI and suggest shifts from more southwestern Bering Sea moisture sources with less sea ice (lower d-excess values) to more northern Arctic Ocean moisture sources with more sea ice (higher d-excess values). Northern ice covered Arctic Ocean McCall Glacier moisture sources are associated with weak Aleutian Low (AL) circulation patterns and the southern moisture sources with strong AL patterns. Ice core d-excess values significantly decrease over the record, coincident with warmer temperatures and a significant reduction in Alaska sea ice concentration, which suggests that ice free northern ocean waters are increasingly serving as terrestrial precipitation moisture sources; a concept recently proposed by modeling studies and also present in Greenland ice core d-excess values during previous transitions to warm periods. This study also shows the efficacy and importance of using ice cores from Arctic valley glaciers in paleoclimate reconstructions.
HPC parallel programming model for gyrokinetic MHD simulation

International Nuclear Information System (INIS)

Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

2011-01-01

The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
Roosevelt Island Climate Evolution Project (RICE): A 65 Kyr ice core record of black carbon aerosol deposition to the Ross Ice Shelf, West Antarctica.

Science.gov (United States)

Edwards, Ross; Bertler, Nancy; Tuohy, Andrea; Neff, Peter; Proemse, Bernedette; Feiteng, Wang; Goodwin, Ian; Hogan, Chad

2015-04-01

Emitted by fires, black carbon aerosols (rBC) perturb the atmosphere's physical and chemical properties and are climatically active. Sedimentary charcoal and other paleo-fire records suggest that rBC emissions have varied significantly in the past due to human activity and climate variability. However, few paleo rBC records exist to constrain reconstructions of the past rBC atmospheric distribution and its climate interaction. As part of the international Roosevelt Island Climate Evolution (RICE) project, we have developed an Antarctic rBC ice core record spanning the past ~65 Kyr. The RICE deep ice core was drilled from the Roosevelt Island ice dome in West Antarctica from 2011 to 2013. The high depth resolution (~ 1 cm) record was developed using a single particle intracavity laser-induced incandescence soot photometer (SP2) coupled to an ice core melter system. The rBC record displays sub-annual variability consistent with both austral dry-season and summer biomass burning. The record exhibits significant decadal to millennial-scale variability consistent with known changes in climate. Glacial rBC concentrations were much lower than Holocene concentrations with the exception of several periods of abrupt increases in rBC. The transition from glacial to interglacial rBC concentrations occurred over a much longer time relative to other ice core climate proxies such as water isotopes and suggests . The protracted increase in rBC during the transition may reflected Southern hemisphere ecosystem / fire regime changes in response to hydroclimate and human activity.
Professional Parallel Programming with C# Master Parallel Extensions with NET 4

CERN Document Server

Hillar, Gastón

2010-01-01

Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach
Multiproxy records of Holocene climate and glacier variability from sediment cores in the Cordillera Vilcabamba of southern Peru

Science.gov (United States)

Schweinsberg, A. D.; Licciardi, J. M.; Rodbell, D. T.; Stansell, N.; Tapia, P. M.

2012-12-01

Sediments contained in glacier-fed lakes and bogs provide continuous high-resolution records of glacial activity, and preserve multiproxy evidence of Holocene climate change. Tropical glacier fluctuations offer critical insight on regional paleoclimatic trends and controls, however, continuous sediment records of past tropical climates are limited. Recent cosmogenic 10Be surface exposure ages of moraine sequences in the Cordillera Vilcabamba of southern Peru (13°20'S latitude) reveal a glacial culmination during the early Holocene and a less extensive glaciation coincident with the Little Ice Age of the Northern Hemisphere. Here we supplement the existing 10Be moraine chronology with the first continuous records of multiproxy climate data in this mountain range from sediment cores recovered from bogs in direct stratigraphic contact with 10Be-dated moraines. Radiocarbon-dated sedimentological changes in a 2-meter long bog core reveal that the Holocene is characterized by alternating inorganic and organic-rich laminae, suggesting high-frequency climatic variability. Carbon measurements, bulk density, and bulk sedimentation rates are used to derive a record of clastic sediment flux that serves as a proxy indicator of former glacier activity. Preliminary analyses of the bog core reveal approximately 70 diatom taxa that indicate both rheophilic and lentic environments. Initial results show a general decrease in magnetic susceptibility and clastic flux throughout the early to mid-Holocene, which suggests an interval of deglaciation. An episode of high clastic flux from 3.8 to 2.0 ka may reflect a late Holocene glacial readvance. Volcanic glass fragments and an anomalous peak in magnetic susceptibility may correspond to the historical 1600 AD eruption of Huaynaputina. Ten new bog and lake sediment cores were collected during the 2012 field expedition and analytical measurements are underway. Ongoing efforts are focused on analyzing diatom assemblage data, developing
Marine sediment cores database for the Mediterranean Basin: a tool for past climatic and environmental studies

Science.gov (United States)

Alberico, I.; Giliberti, I.; Insinga, D. D.; Petrosino, P.; Vallefuoco, M.; Lirer, F.; Bonomo, S.; Cascella, A.; Anzalone, E.; Barra, R.; Marsella, E.; Ferraro, L.

2017-06-01

Paleoclimatic data are essential for fingerprinting the climate of the earth before the advent of modern recording instruments. They enable us to recognize past climatic events and predict future trends. Within this framework, a conceptual and logical model was drawn to physically implement a paleoclimatic database named WDB-Paleo that includes the paleoclimatic proxies data of marine sediment cores of the Mediterranean Basin. Twenty entities were defined to record four main categories of data: a) the features of oceanographic cruises and cores (metadata); b) the presence/absence of paleoclimatic proxies pulled from about 200 scientific papers; c) the quantitative analysis of planktonic and benthonic foraminifera, pollen, calcareous nannoplankton, magnetic susceptibility, stable isotopes, radionuclides values of about 14 cores recovered by Institute for Coastal Marine Environment (IAMC) of Italian National Research Council (CNR) in the framework of several past research projects; d) specific entities recording quantitative data on δ18O, AMS 14C (Accelerator Mass Spectrometry) and tephra layers available in scientific papers. Published data concerning paleoclimatic proxies in the Mediterranean Basin are recorded only for 400 out of 6000 cores retrieved in the area and they show a very irregular geographical distribution. Moreover, the data availability decreases when a constrained time interval is investigated or more than one proxy is required. We present three applications of WDB-Paleo for the Younger Dryas (YD) paleoclimatic event at Mediterranean scale and point out the potentiality of this tool for integrated stratigraphy studies.
Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU

Directory of Open Access Journals (Sweden)

Guangyuan Kan

2016-01-01

Full Text Available The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.
A Parallel Saturation Algorithm on Shared Memory Architectures

Science.gov (United States)

Ezekiel, Jonathan; Siminiceanu

2007-01-01

Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey

2014-01-01

-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel
Paleo-Climate and Glaciological Reconstruction in Central Asia through the Collection and Analysis of Ice Cores and Instrumental Data from the Tien Shan

International Nuclear Information System (INIS)

Vladimir Aizen; Donald Bren; Karl Kreutz; Cameron Wake

2001-01-01

While the majority of ice core investigations have been undertaken in the polar regions, a few ice cores recovered from carefully selected high altitude/mid-to-low latitude glaciers have also provided valuable records of climate variability in these regions. A regional array of high resolution, multi-parameter ice core records developed from temperate and tropical regions of the globe can be used to document regional climate and environmental change in the latitudes which are home to the vase majority of the Earth's human population. In addition, these records can be directly compared with ice core records available from the polar regions and can therefore expand our understanding of inter-hemispheric dynamics of past climate changes. The main objectives of our paleoclimate research in the Tien Shan mountains of middle Asia combine the development of detailed paleoenvironmental records via the physical and chemical analysis of ice cores with the analysis of modern meteorological and hydrological data. The first step in this research was the collection of ice cores from the accumulation zone of the Inylchek Glacier and the collection of meteorological data from a variety of stations throughout the Tien Shan. The research effort described in this report was part of a collaborative effort with the United State Geological Survey's (USGS) Global Environmental Research Program which began studying radionuclide deposition in mid-latitude glaciers in 1995
Paleo-Climate and Glaciological Reconstruction in Central Asia through the Collection and Analysis of Ice Cores and Instrumental Data from the Tien Shan

Energy Technology Data Exchange (ETDEWEB)

Vladimir Aizen; Donald Bren; Karl Kreutz; Cameron Wake

2001-05-30

While the majority of ice core investigations have been undertaken in the polar regions, a few ice cores recovered from carefully selected high altitude/mid-to-low latitude glaciers have also provided valuable records of climate variability in these regions. A regional array of high resolution, multi-parameter ice core records developed from temperate and tropical regions of the globe can be used to document regional climate and environmental change in the latitudes which are home to the vase majority of the Earth's human population. In addition, these records can be directly compared with ice core records available from the polar regions and can therefore expand our understanding of inter-hemispheric dynamics of past climate changes. The main objectives of our paleoclimate research in the Tien Shan mountains of middle Asia combine the development of detailed paleoenvironmental records via the physical and chemical analysis of ice cores with the analysis of modern meteorological and hydrological data. The first step in this research was the collection of ice cores from the accumulation zone of the Inylchek Glacier and the collection of meteorological data from a variety of stations throughout the Tien Shan. The research effort described in this report was part of a collaborative effort with the United State Geological Survey's (USGS) Global Environmental Research Program which began studying radionuclide deposition in mid-latitude glaciers in 1995.
Peeking Below the Snow Surface to Explore Amundsen Sea Climate Variability and Locate Optimal Ice-Core Sites

Science.gov (United States)

Neff, P. D.; Fudge, T. J.; Medley, B.

2016-12-01

Observations over recent decades reveal rapid changes in ice shelves and fast-flowing grounded ice along the Amundsen Sea coast of the West Antarctic Ice Sheet (WAIS). Long-term perspectives on this ongoing ice loss are needed to address a central question of Antarctic research: how much and how fast will Antarctic ice-loss raise sea level? Ice cores can provide insight into past variability of the atmospheric (wind) forcing of regional ocean dynamics affecting ice loss. Interannual variability of snow accumulation on coastal ice domes grounded near or within ice shelves reflects local to regional atmospheric circulation near the ice-ocean interface. Records of snow accumulation inferred from shallow ice cores strongly correlate with reanalysis precipitation and pressure fields, but ice cores have not yet been retrieved along the Amundsen Sea coast. High-frequency airborne radar data (NASA Operation IceBridge), however, have been collected over this region and we demonstrate that these data accurately reflect annual stratigraphy in shallow snow and firn (1 to 2 decades of accumulation). This further validates the agreement between radar snow accumulation records and climate reanalysis products. We then explore regional climate controls on local snow accumulation through comparison with gridded reanalysis products, providing a preview of what information longer coastal ice core records may provide with respect to past atmospheric forcing of ocean circulation and WAIS ice loss.
Distributed Memory Parallel Computing with SEAWAT

Science.gov (United States)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2008-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2009-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...
PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

Directory of Open Access Journals (Sweden)

R. Maheswari

2012-04-01

Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.
Parallel execution of chemical software on EGEE Grid

CERN Document Server

Sterzel, Mariusz

2008-01-01

Constant interest among chemical community to study larger and larger molecules forces the parallelization of existing computational methods in chemistry and development of new ones. These are main reasons of frequent port updates and requests from the community for the Grid ports of new packages to satisfy their computational demands. Unfortunately some parallelization schemes used by chemical packages cannot be directly used in Grid environment. Here we present a solution for Gaussian package. The current state of development of Grid middleware allows easy parallel execution in case of software using any of MPI flavour. Unfortunately many chemical packages do not use MPI for parallelization therefore special treatment is needed. Gaussian can be executed in parallel on SMP architecture or via Linda. These require reservation of certain number of processors/cores on a given WN and the equal number of processors/cores on each WN, respectively. The current implementation of EGEE middleware does not offer such f...
Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

Directory of Open Access Journals (Sweden)

GORGUNOGLU, S.

2014-05-01

Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.
GPU based numerical simulation of core shooting process

Directory of Open Access Journals (Sweden)

Yi-zhong Zhang

2017-11-01

Full Text Available Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, research on numerical simulation of the core shooting process is very limited. Based on a two-fluid model (TFM and a kinetic-friction constitutive correlation, a program for 3D numerical simulation of the core shooting process has been developed and achieved good agreements with in-situ experiments. To match the needs of engineering applications, a graphics processing unit (GPU has also been used to improve the calculation efficiency. The parallel algorithm based on the Compute Unified Device Architecture (CUDA platform can significantly decrease computing time by multi-threaded GPU. In this work, the program accelerated by CUDA parallelization method was developed and the accuracy of the calculations was ensured by comparing with in-situ experimental results photographed by a high-speed camera. The design and optimization of the parallel algorithm were discussed. The simulation result of a sand core test-piece indicated the improvement of the calculation efficiency by GPU. The developed program has also been validated by in-situ experiments with a transparent core-box, a high-speed camera, and a pressure measuring system. The computing time of the parallel program was reduced by nearly 95% while the simulation result was still quite consistent with experimental data. The GPU parallelization method can successfully solve the problem of low computational efficiency of the 3D sand shooting simulation program, and thus the developed GPU program is appropriate for engineering applications.

Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

Science.gov (United States)

Laura, Jason R.; Rey, Sergio J.

2017-01-01

Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
Unveiling exceptional Baltic bog ecohydrology, autogenic succession and climate change during the last 2000 years in CE Europe using replicate cores, multi-proxy data and functional traits of testate amoebae

Science.gov (United States)

Gałka, Mariusz; Tobolski, Kazimierz; Lamentowicz, Łukasz; Ersek, Vasile; Jassey, Vincent E. J.; van der Knaap, Willem O.; Lamentowicz, Mariusz

2017-01-01

We present the results of high-resolution, multi-proxy palaeoecological investigations of two parallel peat cores from the Baltic raised bog Mechacz Wielki in NE Poland. We aim to evaluate the role of regional climate and autogenic processes of the raised bog itself in driving the vegetation and hydrology dynamics. Based on partly synchronous changes in Sphagnum communities in the two study cores we suggest that extrinsic factors (climate) played an important role as a driver in mire development during the bog stage (500-2012 CE). Using a testate amoebae transfer function, we found exceptionally stable hydrological conditions during the last 2000 years with a relatively high water table and lack of local fire events that allowed for rapid peat accumulation (2.75 mm/year) in the bog. Further, the strong correlation between pH and community-weighted mean of testate amoeba traits suggests that other variables than water-table depth play a role in driving microbial properties under stable hydrological conditions. There is a difference in hydrological dynamics in bogs between NW and NE Poland until ca 1500 CE, after which the water table reconstructions show more similarities. Our results illustrate how various functional traits relate to different environmental variables in a range of trophic and hydrological scenarios on long time scales. Moreover, our data suggest a common regional climatic forcing in Mechacz Wielki, Gązwa and Kontolanrahka. Though it may still be too early to attempt a regional summary of wetness change in the southern Baltic region, this study is a next step to better understand the long-term peatland palaeohydrology in NE Europe.
Millennial and sub-millennial scale climatic variations recorded in polar ice cores over the last glacial period

DEFF Research Database (Denmark)

Capron, E.; Landais, A.; Chappellaz, J.

2010-01-01

Since its discovery in Greenland ice cores, the millennial scale climatic variability of the last glacial period has been increasingly documented at all latitudes with studies focusing mainly on Marine Isotopic Stage 3 (MIS 3; 28–60 thousand of years before present, hereafter ka) and characterized...... that when ice sheets are extensive, Antarctica does not necessarily warm during the whole GS as the thermal bipolar seesaw model would predict, questioning the Greenland ice core temperature records as a proxy for AMOC changes throughout the glacial period....
Ice cores and palaeoclimate

International Nuclear Information System (INIS)

Krogh Andersen, K.; Ditlevsen, P.; Steffensen, J.P.

2001-01-01

Ice cores from Greenland give testimony of a highly variable climate during the last glacial period. Dramatic climate warmings of 15 to 25 deg. C for the annual average temperature in less than a human lifetime have been documented. Several questions arise: Why is the Holocene so stable? Is climatic instability only a property of glacial periods? What is the mechanism behind the sudden climate changes? Are the increased temperatures in the past century man-made? And what happens in the future? The ice core community tries to attack some of these problems. The NGRIP ice core currently being drilled is analysed in very high detail, allowing for a very precise dating of climate events. It will be possible to study some of the fast changes on a year by year basis and from this we expect to find clues to the sequence of events during rapid changes. New techniques are hoped to allow for detection of annual layers as far back as 100,000 years and thus a much improved time scale over past climate changes. It is also hoped to find ice from the Eemian period. If the Eemian layers confirm the GRIP sequence, the Eemian was actually climatically unstable just as the glacial period. This would mean that the stability of the Holocene is unique. It would also mean, that if human made global warming indeed occurs, we could jeopardize the Holocene stability and create an unstable 'Eemian situation' which ultimately could start an ice age. Currenlty mankind is changing the composition of the atmosphere. Ice cores document significant increases in greenhouse gases, and due to increased emissions of sulfuric and nitric acid from fossil fuel burning, combustion engines and agriculture, modern Greenland snow is 3 - 5 times more acidic than pre-industrial snow (Mayewski et al., 1986). However, the magnitude and abruptness of the temperature changes of the past century do not exceed the magnitude of natural variability. It is from the ice core perspective thus not possible to attribute the
Exact parallel maximum clique algorithm for general and protein graphs.

Science.gov (United States)

Depolli, Matjaž; Konc, Janez; Rozman, Kati; Trobec, Roman; Janežič, Dušanka

2013-09-23

A new exact parallel maximum clique algorithm MaxCliquePara, which finds the maximum clique (the fully connected subgraph) in undirected general and protein graphs, is presented. First, a new branch and bound algorithm for finding a maximum clique on a single computer core, which builds on ideas presented in two published state of the art sequential algorithms is implemented. The new sequential MaxCliqueSeq algorithm is faster than the reference algorithms on both DIMACS benchmark graphs as well as on protein-derived product graphs used for protein structural comparisons. Next, the MaxCliqueSeq algorithm is parallelized by splitting the branch-and-bound search tree to multiple cores, resulting in MaxCliquePara algorithm. The ability to exploit all cores efficiently makes the new parallel MaxCliquePara algorithm markedly superior to other tested algorithms. On a 12-core computer, the parallelization provides up to 2 orders of magnitude faster execution on the large DIMACS benchmark graphs and up to an order of magnitude faster execution on protein product graphs. The algorithms are freely accessible on http://commsys.ijs.si/~matjaz/maxclique.
Shared Variable Oriented Parallel Precompiler for SPMD Model

Institute of Scientific and Technical Information of China (English)

无

1995-01-01

For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
Coring of Karakel’ Lake sediments (Teberda River valley and prospects for reconstruction of glaciation and Holocene climate history in the Caucasus

Directory of Open Access Journals (Sweden)

O. N. Solomina

2013-01-01

Full Text Available Lacustrine sediments represent an important data source for glacial and palaeoclimatic reconstructions. Having a number of certain advantages, they can be successfully used as a means of specification of glacier situation and age of moraine deposits, as well as a basis for detailed climatic models of the Holocene. The article focuses on the coring of sediments of Lake Kakakel (Western Caucasus that has its goal to clarify the Holocene climatic history for the region, providing the sampling methods, lithologic description of the sediment core, obtained radiocarbon dating and the element composition of the sediments. The primary outlook over the results of coring of the sediments of the Lake Karakyol helped to reconsider the conventional opinion on the glacial fluctuations in the valley of Teberda and to assume the future possibility for high-definition palaeoclimatic reconstruction for Western Caucasus.
Introducing 'bones' : a parallelizing source-to-source compiler based on algorithmic skeletons.

NARCIS (Netherlands)

Nugteren, C.; Corporaal, H.

2012-01-01

Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Science.gov (United States)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Directory of Open Access Journals (Sweden)

Cerati Giuseppe

2017-01-01

Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Energy Technology Data Exchange (ETDEWEB)

Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U.; Riley, Daniel [Cornell U., LNS; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
An Automatic Instruction-Level Parallelization of Machine Code

Directory of Open Access Journals (Sweden)

MARINKOVIC, V.

2018-02-01

Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.
The STAPL Parallel Graph Library

KAUST Repository

Harshvardhan,

2013-01-01

This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.
PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

Directory of Open Access Journals (Sweden)

B. Kumar

2016-06-01

Full Text Available Extended morphological profile (EMP is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.
A 21 000-year record of fluorescent organic matter markers in the WAIS Divide ice core

Science.gov (United States)

D'Andrilli, Juliana; Foreman, Christine M.; Sigl, Michael; Priscu, John C.; McConnell, Joseph R.

2017-05-01

Englacial ice contains a significant reservoir of organic material (OM), preserving a chronological record of materials from Earth's past. Here, we investigate if OM composition surveys in ice core research can provide paleoecological information on the dynamic nature of our Earth through time. Temporal trends in OM composition from the early Holocene extending back to the Last Glacial Maximum (LGM) of the West Antarctic Ice Sheet Divide (WD) ice core were measured by fluorescence spectroscopy. Multivariate parallel factor (PARAFAC) analysis is widely used to isolate the chemical components that best describe the observed variation across three-dimensional fluorescence spectroscopy (excitation-emission matrices; EEMs) assays. Fluorescent OM markers identified by PARAFAC modeling of the EEMs from the LGM (27.0-18.0 kyr BP; before present 1950) through the last deglaciation (LD; 18.0-11.5 kyr BP), to the mid-Holocene (11.5-6.0 kyr BP) provided evidence of different types of fluorescent OM composition and origin in the WD ice core over 21.0 kyr. Low excitation-emission wavelength fluorescent PARAFAC component one (C1), associated with chemical species similar to simple lignin phenols was the greatest contributor throughout the ice core, suggesting a strong signature of terrestrial OM in all climate periods. The component two (C2) OM marker, encompassed distinct variability in the ice core describing chemical species similar to tannin- and phenylalanine-like material. Component three (C3), associated with humic-like terrestrial material further resistant to biodegradation, was only characteristic of the Holocene, suggesting that more complex organic polymers such as lignins or tannins may be an ecological marker of warmer climates. We suggest that fluorescent OM markers observed during the LGM were the result of greater continental dust loading of lignin precursor (monolignol) material in a drier climate, with lower marine influences when sea ice extent was higher and
Parallelization characteristics of the DeCART code

International Nuclear Information System (INIS)

Cho, J. Y.; Joo, H. G.; Kim, H. Y.; Lee, C. C.; Chang, M. H.; Zee, S. Q.

2003-12-01

This report is to describe the parallelization characteristics of the DeCART code and also examine its parallel performance. Parallel computing algorithms are implemented to DeCART to reduce the tremendous computational burden and memory requirement involved in the three-dimensional whole core transport calculation. In the parallelization of the DeCART code, the axial domain decomposition is first realized by using MPI (Message Passing Interface), and then the azimuthal angle domain decomposition by using either MPI or OpenMP. When using the MPI for both the axial and the angle domain decomposition, the concept of MPI grouping is employed for convenient communication in each communication world. For the parallel computation, most of all the computing modules except for the thermal hydraulic module are parallelized. These parallelized computing modules include the MOC ray tracing, CMFD, NEM, region-wise cross section preparation and cell homogenization modules. For the distributed allocation, most of all the MOC and CMFD/NEM variables are allocated only for the assigned planes, which reduces the required memory by a ratio of the number of the assigned planes to the number of all planes. The parallel performance of the DeCART code is evaluated by solving two problems, a rodded variation of the C5G7 MOX three-dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In the aspect of parallel performance, the DeCART code shows a good speedup of about 40.1 and 22.4 in the ray tracing module and about 37.3 and 20.2 in the total computing time when using 48 CPUs on the IBM Regatta and 24 CPUs on the LINUX cluster, respectively. In the comparison between the MPI and OpenMP, OpenMP shows a somewhat better performance than MPI. Therefore, it is concluded that the first priority in the parallel computation of the DeCART code is in the axial domain decomposition by using MPI, and then in the angular domain using OpenMP, and finally the angular
Past climate changes derived from isotope measurements in polar ice cores

International Nuclear Information System (INIS)

Beer, J.; Muscheler, R.; Wagner, G.; Kubik, P.K.

2002-01-01

Measurements of stable and radioactive isotopes in polar ice cores provide a wealth of information on the climate conditions of the past. Stable isotopes (δ 18 O, δD) reflect mainly the temperature, whereas δ 18 O of oxygen in air bubbles reveals predominantly the global ice volume and the biospheric activity. Cosmic ray produced radioisotopes (cosmogenic nuclides) such as 10 Be and 36 Cl record information on the solar variability and possibly also on the solar irradiance. If the flux of a cosmogenic nuclide into the ice is known the accumulation rate can be derived from the measured concentration. The comparison of 10 Be from ice with 14 C from tree rings allows deciding whether observed 14 C variations are caused by production or system effects. Finally, isotope measurements are very useful for establishing and improving time scales. The 10 Be/ 36 Cl ratio changes with an apparent half-life of 376,000 years and is therefore well suited to date old ice. Significant abrupt changes in the records of 10 Be, 36 Cl from ice and of δ 18 O from atmospheric oxygen representing global signals can be used to synchronize ice and sediment cores. (author)
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

2013-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will
The Crystal Structures of the N-terminal Photosensory Core Module of Agrobacterium Phytochrome Agp1 as Parallel and Anti-parallel Dimers*

Science.gov (United States)

Nagano, Soshichiro; Scheerer, Patrick; Zubow, Kristina; Michael, Norbert; Inomata, Katsuhiko; Lamparter, Tilman; Krauß, Norbert

2016-01-01

Agp1 is a canonical biliverdin-binding bacteriophytochrome from the soil bacterium Agrobacterium fabrum that acts as a light-regulated histidine kinase. Crystal structures of the photosensory core modules (PCMs) of homologous phytochromes have provided a consistent picture of the structural changes that these proteins undergo during photoconversion between the parent red light-absorbing state (Pr) and the far-red light-absorbing state (Pfr). These changes include secondary structure rearrangements in the so-called tongue of the phytochrome-specific (PHY) domain and structural rearrangements within the long α-helix that connects the cGMP-specific phosphodiesterase, adenylyl cyclase, and FhlA (GAF) and the PHY domains. We present the crystal structures of the PCM of Agp1 at 2.70 Å resolution and of a surface-engineered mutant of this PCM at 1.85 Å resolution in the dark-adapted Pr states. Whereas in the mutant structure the dimer subunits are in anti-parallel orientation, the wild-type structure contains parallel subunits. The relative orientations between the PAS-GAF bidomain and the PHY domain are different in the two structures, due to movement involving two hinge regions in the GAF-PHY connecting α-helix and the tongue, indicating pronounced structural flexibility that may give rise to a dynamic Pr state. The resolution of the mutant structure enabled us to detect a sterically strained conformation of the chromophore at ring A that we attribute to the tight interaction with Pro-461 of the conserved PRXSF motif in the tongue. Based on this observation and on data from mutants where residues in the tongue region were replaced by alanine, we discuss the crucial roles of those residues in Pr-to-Pfr photoconversion. PMID:27466363
The Crystal Structures of the N-terminal Photosensory Core Module of Agrobacterium Phytochrome Agp1 as Parallel and Anti-parallel Dimers.

Science.gov (United States)

Nagano, Soshichiro; Scheerer, Patrick; Zubow, Kristina; Michael, Norbert; Inomata, Katsuhiko; Lamparter, Tilman; Krauß, Norbert

2016-09-23

Agp1 is a canonical biliverdin-binding bacteriophytochrome from the soil bacterium Agrobacterium fabrum that acts as a light-regulated histidine kinase. Crystal structures of the photosensory core modules (PCMs) of homologous phytochromes have provided a consistent picture of the structural changes that these proteins undergo during photoconversion between the parent red light-absorbing state (Pr) and the far-red light-absorbing state (Pfr). These changes include secondary structure rearrangements in the so-called tongue of the phytochrome-specific (PHY) domain and structural rearrangements within the long α-helix that connects the cGMP-specific phosphodiesterase, adenylyl cyclase, and FhlA (GAF) and the PHY domains. We present the crystal structures of the PCM of Agp1 at 2.70 Å resolution and of a surface-engineered mutant of this PCM at 1.85 Å resolution in the dark-adapted Pr states. Whereas in the mutant structure the dimer subunits are in anti-parallel orientation, the wild-type structure contains parallel subunits. The relative orientations between the PAS-GAF bidomain and the PHY domain are different in the two structures, due to movement involving two hinge regions in the GAF-PHY connecting α-helix and the tongue, indicating pronounced structural flexibility that may give rise to a dynamic Pr state. The resolution of the mutant structure enabled us to detect a sterically strained conformation of the chromophore at ring A that we attribute to the tight interaction with Pro-461 of the conserved PRXSF motif in the tongue. Based on this observation and on data from mutants where residues in the tongue region were replaced by alanine, we discuss the crucial roles of those residues in Pr-to-Pfr photoconversion. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

Separated core turbofan engine; Core bunrigata turbofan engine

Energy Technology Data Exchange (ETDEWEB)

Saito, Y; Endo, M; Matsuda, Y; Sugiyama, N; Sugahara, N; Yamamoto, K [National Aerospace Laboratory, Tokyo (Japan)

1996-04-01

This report outlines the separated core turbofan engine. This engine is featured by parallel separated arrangement of a fan and core engine which are integrated into one unit in the conventional turbofan engine. In general, cruising efficiency improvement and noise reduction are achieved by low fan pressure ratio and low exhaust speed due to high bypass ratio, however, it causes various problems such as large fan and nacelle weight due to large air flow rate of a fan, and shift of an operating point affected by flight speed. The parallel separated arrangement is thus adopted. The stable operation of a fan and core engine is easily retained by independently operating air inlet unaffected by fan. The large degree of freedom of combustion control is also obtained by independent combustor. Fast response, simple structure and optimum aerodynamic design are easily achieved. This arrangement is also featured by flexibility of development and easy maintenance, and by various merits superior to conventional turbofan engines. It has no technological problems difficult to be overcome, and is also suitable for high-speed VTOL transport aircraft. 4 refs., 5 figs.
Fast parallel event reconstruction

CERN Multimedia

CERN. Geneva

2010-01-01

On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer. Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...
Toward an ultra-high resolution community climate system model for the BlueGene platform

International Nuclear Information System (INIS)

Dennis, John M; Jacob, Robert; Vertenstein, Mariana; Craig, Tony; Loy, Raymond

2007-01-01

Global climate models need to simulate several small, regional-scale processes which affect the global circulation in order to accurately simulate the climate. This is particularly important in the ocean where small scale features such as oceanic eddies are currently represented with adhoc parameterizations. There is also a need for higher resolution to provide climate predictions at small, regional scales. New high-performance computing platforms such as the IBM BlueGene can provide the necessary computational power to perform ultra-high resolution climate model integrations. We have begun to investigate the scaling of the individual components of the Community Climate System Model to prepare it for integrations on BlueGene and similar platforms. Our investigations show that it is possible to successfully utilize O(32K) processors. We describe the scalability of five models: the Parallel Ocean Program (POP), the Community Ice CodE (CICE), the Community Land Model (CLM), and the new CCSM sequential coupler (CPL7) which are components of the next generation Community Climate System Model (CCSM); as well as the High-Order Method Modeling Environment (HOMME) which is a dynamical core currently being evaluated within the Community Atmospheric Model. For our studies we concentrate on 1/10 0 resolution for CICE, POP, and CLM models and 1/4 0 resolution for HOMME. The ability to simulate high resolutions on the massively parallel petascale systems that will dominate high-performance computing for the foreseeable future is essential to the advancement of climate science
Parallel Algorithms for the Exascale Era

Energy Technology Data Exchange (ETDEWEB)

Robey, Robert W. [Los Alamos National Laboratory

2016-10-19

New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
Dead Sea deep cores: A window into past climate and seismicity

Science.gov (United States)

Stein, Mordechai; Ben-Avraham, Zvi; Goldstein, Steven L.

2011-12-01

The area surrounding the Dead Sea was the locus of humankind's migration out of Africa and thus has been the home of peoples since the Stone Age. For this reason, understanding the climate and tectonic history of the region provides valuable insight into archaeology and studies of human history and helps to gain a better picture of future climate and tectonic scenarios. The deposits at the bottom of the Dead Sea are a geological archive of the environmental conditions (e.g., rains, floods, dust storms, droughts) during ice ages and warm ages, as well as of seismic activity in this key region. An International Continental Scientific Drilling Program (ICDP) deep drilling project was performed in the Dead Sea between November 2010 and March 2011. The project was funded by the ICDP and agencies in Israel, Germany, Japan, Norway, Switzerland, and the United States. Drilling was conducted using the new Large Lake Drilling Facility (Figure 1), a barge with a drilling rig run by DOSECC, Inc. (Drilling, Observation and Sampling of the Earth's Continental Crust), a nonprofit corporation dedicated to advancing scientific drilling worldwide. The main purpose of the project was to recover a long, continuous core to provide a high resolution record of the paleoclimate, paleoenvironment, paleoseismicity, and paleomagnetism of the Dead Sea Basin. With this, scientists are beginning to piece together a record of the climate and seismic history of the Middle East during the past several hundred thousand years in millennial to decadal to annual time resolution.
Millennial and sub-millennial scale climatic variations recorded in polar ice cores over the last glacial period

Directory of Open Access Journals (Sweden)

E. Capron

2010-06-01

Full Text Available Since its discovery in Greenland ice cores, the millennial scale climatic variability of the last glacial period has been increasingly documented at all latitudes with studies focusing mainly on Marine Isotopic Stage 3 (MIS 3; 28–60 thousand of years before present, hereafter ka and characterized by short Dansgaard-Oeschger (DO events. Recent and new results obtained on the EPICA and NorthGRIP ice cores now precisely describe the rapid variations of Antarctic and Greenland temperature during MIS 5 (73.5–123 ka, a time period corresponding to relatively high sea level. The results display a succession of abrupt events associated with long Greenland InterStadial phases (GIS enabling us to highlight a sub-millennial scale climatic variability depicted by (i short-lived and abrupt warming events preceding some GIS (precursor-type events and (ii abrupt warming events at the end of some GIS (rebound-type events. The occurrence of these sub-millennial scale events is suggested to be driven by the insolation at high northern latitudes together with the internal forcing of ice sheets. Thanks to a recent NorthGRIP-EPICA Dronning Maud Land (EDML common timescale over MIS 5, the bipolar sequence of climatic events can be established at millennial to sub-millennial timescale. This shows that for extraordinary long stadial durations the accompanying Antarctic warming amplitude cannot be described by a simple linear relationship between the two as expected from the bipolar seesaw concept. We also show that when ice sheets are extensive, Antarctica does not necessarily warm during the whole GS as the thermal bipolar seesaw model would predict, questioning the Greenland ice core temperature records as a proxy for AMOC changes throughout the glacial period.
The core of Ure2p prion fibrils is formed by the N-terminal segment in a parallel cross-β structure: evidence from solid-state NMR.

Science.gov (United States)

Kryndushkin, Dmitry S; Wickner, Reed B; Tycko, Robert

2011-06-03

Intracellular fibril formation by Ure2p produces the non-Mendelian genetic element [URE3] in Saccharomyces cerevisiae, making Ure2p a prion protein. We show that solid-state NMR spectra of full-length Ure2p fibrils, seeded with infectious prions from a specific [URE3] strain and labeled with uniformly (15)N-(13)C-enriched Ile, include strong, sharp signals from Ile residues in the globular C-terminal domain (CTD) with both helical and nonhelical (13)C chemical shifts. Treatment with proteinase K eliminates these CTD signals, leaving only nonhelical signals from the Gln-rich and Asn-rich N-terminal segment, which are also observed in the solid-state NMR spectra of Ile-labeled fibrils formed by residues 1-89 of Ure2p. Thus, the N-terminal segment, or "prion domain" (PD), forms the fibril core, while CTD units are located outside the core. We additionally show that, after proteinase K treatment, Ile-labeled Ure2p fibrils formed without prion seeding exhibit a broader set of solid-state NMR signals than do prion-seeded fibrils, consistent with the idea that structural variations within the PD core account for prion strains. Measurements of (13)C-(13)C magnetic dipole-dipole couplings among (13)C-labeled Ile carbonyl sites in full-length Ure2p fibrils support an in-register parallel β-sheet structure for the PD core of Ure2p fibrils. Finally, we show that a model in which CTD units are attached rigidly to the parallel β-sheet core is consistent with steric constraints. Published by Elsevier Ltd.
A highly efficient multi-core algorithm for clustering extremely large datasets

Directory of Open Access Journals (Sweden)

Kraus Johann M

2010-04-01

Full Text Available Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.
A scalable method for parallelizing sampling-based motion planning algorithms

KAUST Repository

Jacobs, Sam Ade; Manavi, Kasra; Burgos, Juan; Denny, Jory; Thomas, Shawna; Amato, Nancy M.

2012-01-01

This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.
A scalable method for parallelizing sampling-based motion planning algorithms

KAUST Repository

Jacobs, Sam Ade

2012-05-01

This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.
Block-Parallel Data Analysis with DIY2

Energy Technology Data Exchange (ETDEWEB)

Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-08-30

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.
Three-dimensional gyrokinetic particle-in-cell simulation of plasmas on a massively parallel computer: Final report on LDRD Core Competency Project, FY 1991--FY 1993

International Nuclear Information System (INIS)

Byers, J.A.; Williams, T.J.; Cohen, B.I.; Dimits, A.M.

1994-01-01

One of the programs of the Magnetic fusion Energy (MFE) Theory and computations Program is studying the anomalous transport of thermal energy across the field lines in the core of a tokamak. We use the method of gyrokinetic particle-in-cell simulation in this study. For this LDRD project we employed massively parallel processing, new algorithms, and new algorithms, and new formal techniques to improve this research. Specifically, we sought to take steps toward: researching experimentally-relevant parameters in our simulations, learning parallel computing to have as a resource for our group, and achieving a 100 x speedup over our starting-point Cray2 simulation code's performance
ONE SEGMENT OF THE BULGARIAN-ENGLISH PAREMIOLOGICAL CORE

Directory of Open Access Journals (Sweden)

KOTOVA M.Y.

2015-12-01

Full Text Available The English proverbial parallels of the Russian-Bulgarian paremiological core are analysed in the article. The comparison of current Bulgarian proverbs and their English proverbial parallels is based upon the material of the author’s multi-lingual dictionary and her collection of Bulgarian-Russian proverbial parallels published as a result of her sociolinguistic paremiological experiment from 2003 (on the basis of 100 questionnaires filled by 100 Bulgarian respondents and supported in 2013 with the current Bulgarian contexts from the Bulgarian Internet. The number of 'alive' Bulgarian-English proverbial parallels, constructed from the paremiological questionnaires (pointed out by 70 % - 100 % respondents is 62, the biggest part of which belongs to the proverbial parallels with a similar inner form (35, i.e. the biggest part of the segment of the current Bulgarian-English paremiological core (reflecting the Russian paremiological minimum contains proverbial parallels with a similar inner form.
A Tutorial on Parallel and Concurrent Programming in Haskell

Science.gov (United States)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Toward an ultra-high resolution community climate system model for the BlueGene platform

Energy Technology Data Exchange (ETDEWEB)

Dennis, John M [Computer Science Section, National Center for Atmospheric Research, Boulder, CO (United States); Jacob, Robert [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL (United States); Vertenstein, Mariana [Climate and Global Dynamics Division, National Center for Atmospheric Research, Boulder, CO (United States); Craig, Tony [Climate and Global Dynamics Division, National Center for Atmospheric Research, Boulder, CO (United States); Loy, Raymond [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL (United States)

2007-07-15

Global climate models need to simulate several small, regional-scale processes which affect the global circulation in order to accurately simulate the climate. This is particularly important in the ocean where small scale features such as oceanic eddies are currently represented with adhoc parameterizations. There is also a need for higher resolution to provide climate predictions at small, regional scales. New high-performance computing platforms such as the IBM BlueGene can provide the necessary computational power to perform ultra-high resolution climate model integrations. We have begun to investigate the scaling of the individual components of the Community Climate System Model to prepare it for integrations on BlueGene and similar platforms. Our investigations show that it is possible to successfully utilize O(32K) processors. We describe the scalability of five models: the Parallel Ocean Program (POP), the Community Ice CodE (CICE), the Community Land Model (CLM), and the new CCSM sequential coupler (CPL7) which are components of the next generation Community Climate System Model (CCSM); as well as the High-Order Method Modeling Environment (HOMME) which is a dynamical core currently being evaluated within the Community Atmospheric Model. For our studies we concentrate on 1/10{sup 0} resolution for CICE, POP, and CLM models and 1/4{sup 0} resolution for HOMME. The ability to simulate high resolutions on the massively parallel petascale systems that will dominate high-performance computing for the foreseeable future is essential to the advancement of climate science.
The parallel algorithm for the 2D discrete wavelet transform

Science.gov (United States)

Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

2018-04-01

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Scalable Parallelization of Skyline Computation for Multi-core Processors

DEFF Research Database (Denmark)

Chester, Sean; Sidlauskas, Darius; Assent, Ira

2015-01-01

The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multi-core CPU algorithms have been proposed to accelerate the computation...... of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multi-core skyline algorithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads...
1500-year Record of trans-Pacific Dust Flux collected from the Denali Ice Core, Mt. Hunter, Alaska

Science.gov (United States)

Saylor, P. L.; Osterberg, E. C.; Koffman, B. G.; Winski, D.; Ferris, D. G.; Kreutz, K. J.; Wake, C. P.; Handley, M.; Campbell, S. W.

2016-12-01

Mineral dust aerosols are a critical component of the climate system through their influence on atmospheric radiative forcing, ocean productivity, and surface albedo. Dust aerosols derived from Asian deserts are known to reach as far as Europe through efficient transport in the upper tropospheric westerlies. While centennially-to-millennially resolved Asian dust records exist over the late Holocene from North Pacific marine sediment cores and Asian loess deposits, a high-resolution (sub-annual to decadal) record of trans-Pacific dust flux will significantly improve our understanding of North Pacific dust-climate interactions and provide paleoclimatological context for 20th century dust activity. Here we present an annually resolved 1500-year record of trans-Pacific dust transport based on chemical and physical dust measurements in parallel Alaskan ice cores (208 m to bedrock) collected from the summit plateau of Mt. Hunter in Denali National Park. The cores were sampled at high resolution using a continuous melter system with discrete analyses for major ions (Dionex ion chromatograph), trace elements (Element2 inductively coupled plasma mass spectrometer), and stable water isotope ratios (Picarro laser ringdown spectroscopy), and continuous flow analysis for dust concentration and size distribution (Klotz Abakus). We compare the ice core dust record to instrumental aerosol stations, satellite observations, and dust model data from the instrumental period, and evaluate climatic controls on dust emission and trans-Pacific transport using climate reanalysis data, to inform dust-climate relationships over the past 1500 years. Physical particulate and chemical data demonstrate remarkable fidelity at sub-annual resolution, with both displaying a strong springtime peak consistent with periods of high dust activity over Asian desert source regions. Preliminary results suggest volumetric mode typically ranges from 4.5 - 6.5 um, with a mean value of 5.5 um. Preliminary
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
Kalman Filter Tracking on Parallel Architectures

International Nuclear Information System (INIS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2016-01-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment

Past climate change on Sky Islands drives novelty in a core developmental gene network and its phenotype.

Science.gov (United States)

Favé, Marie-Julie; Johnson, Robert A; Cover, Stefan; Handschuh, Stephan; Metscher, Brian D; Müller, Gerd B; Gopalan, Shyamalika; Abouheif, Ehab

2015-09-04

A fundamental and enduring problem in evolutionary biology is to understand how populations differentiate in the wild, yet little is known about what role organismal development plays in this process. Organismal development integrates environmental inputs with the action of gene regulatory networks to generate the phenotype. Core developmental gene networks have been highly conserved for millions of years across all animals, and therefore, organismal development may bias variation available for selection to work on. Biased variation may facilitate repeatable phenotypic responses when exposed to similar environmental inputs and ecological changes. To gain a more complete understanding of population differentiation in the wild, we integrated evolutionary developmental biology with population genetics, morphology, paleoecology and ecology. This integration was made possible by studying how populations of the ant species Monomorium emersoni respond to climatic and ecological changes across five 'Sky Islands' in Arizona, which are mountain ranges separated by vast 'seas' of desert. Sky Islands represent a replicated natural experiment allowing us to determine how repeatable is the response of M. emersoni populations to climate and ecological changes at the phenotypic, developmental, and gene network levels. We show that a core developmental gene network and its phenotype has kept pace with ecological and climate change on each Sky Island over the last ~90,000 years before present (BP). This response has produced two types of evolutionary change within an ant species: one type is unpredictable and contingent on the pattern of isolation of Sky lsland populations by climate warming, resulting in slight changes in gene expression, organ growth, and morphology. The other type is predictable and deterministic, resulting in the repeated evolution of a novel wingless queen phenotype and its underlying gene network in response to habitat changes induced by climate warming. Our
Abrupt climate change:Debate or action

Institute of Scientific and Technical Information of China (English)

CHENG Hai

2004-01-01

Global abrupt climate changes have been documented by various climate records, including ice cores,ocean sediment cores, lake sediment cores, cave deposits,loess deposits and pollen records. The climate system prefers to be in one of two stable states, i.e. interstadial or stadial conditions, but not in between. The transition between two states has an abrupt character. Abrupt climate changes are,in general, synchronous in the northern hemisphere and tropical regions. The timescale for abrupt climate changes can be as short as a decade. As the impacts may be potentially serious, we need to take actions such as reducing CO2emissions to the atmosphere.
Parallel transposition of sparse data structures

DEFF Research Database (Denmark)

Wang, Hao; Liu, Weifeng; Hou, Kaixi

2016-01-01

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Archive of Geosample Data and Information from the Ohio State University Byrd Polar and Climate Research Center (BPCRC) Sediment Core Repository

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The Byrd Polar and Climate Research Center (BPCRC) Sediment Core Repository operated by the Ohio State University is a partner in the Index to Marine and Lacustrine...
Out of core, out of mind: Practical parallel I/O

Energy Technology Data Exchange (ETDEWEB)

Womble, D.E.; Greenberg, D.S.; Riesen, R.E.; Wheat, S.R.

1993-11-01

Parallel computers are becoming more powerful and more complex in response to the demand for computing power by scientists and engineers. Inevitably, new and more complex I/O systems will be developed for these systems. In particular we believe that the I/O system must provide the programmer with the ability to explcitly manage storage (despite the trend toward complex parallel file systems and caching schemes). One method of doing so is to have a partitioned secondary storage in which each processor owns a logical disk. Along with operating system enhancements which allow overheads such as buffer copying to be avoided and libraries to support optimal remapping of data, this sort of I/O system meets the needs of high performance computing.
High performance parallelism pearls 2 multicore and many-core programming approaches

CERN Document Server

Jeffers, Jim

2015-01-01

High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t
Experiences Using Hybrid MPI/OpenMP in the Real World: Parallelization of a 3D CFD Solver for Multi-Core Node Clusters

Directory of Open Access Journals (Sweden)

Gabriele Jost

2010-01-01

Full Text Available Today most systems in high-performance computing (HPC feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP parallelization of IR3D (Incompressible Realistic 3-D code, a full-scale real-world application, which simulates the environmental effects on the evolution of vortices trailing behind control surfaces of underwater vehicles. We discuss performance, scalability and limitations of the pure MPI version of the code on a variety of hardware platforms and show how the hybrid approach can help to overcome certain limitations.
Ice Cores

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — Records of past temperature, precipitation, atmospheric trace gases, and other aspects of climate and environment derived from ice cores drilled on glaciers and ice...
User-friendly parallelization of GAUDI applications with Python

International Nuclear Information System (INIS)

Mato, Pere; Smith, Eoin

2010-01-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
User-friendly parallelization of GAUDI applications with Python

Energy Technology Data Exchange (ETDEWEB)

Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

2010-04-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures.

Directory of Open Access Journals (Sweden)

Rosalba Giugno

Full Text Available Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP, offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i do not fully exploit available parallel computing power and (ii they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid

2014-03-04

© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Acceleration and parallelization calculation of EFEN-SP_3 method

International Nuclear Information System (INIS)

Yang Wen; Zheng Youqi; Wu Hongchun; Cao Liangzhi; Li Yunzhao

2013-01-01

Due to the fact that the exponential function expansion nodal-SP_3 (EFEN-SP_3) method needs further improvement in computational efficiency to routinely carry out PWR whole core pin-by-pin calculation, the coarse mesh acceleration and spatial parallelization were investigated in this paper. The coarse mesh acceleration was built by considering discontinuity factor on each coarse mesh interface and preserving neutron balance within each coarse mesh in space, angle and energy. The spatial parallelization based on MPI was implemented by guaranteeing load balancing and minimizing communications cost to fully take advantage of the modern computing and storage abilities. Numerical results based on a commercial nuclear power reactor demonstrate an speedup ratio of about 40 for the coarse mesh acceleration and a parallel efficiency of higher than 60% with 40 CPUs for the spatial parallelization. With these two improvements, the EFEN code can complete a PWR whole core pin-by-pin calculation with 289 × 289 × 218 meshes and 4 energy groups within 100 s by using 48 CPUs (2.40 GHz frequency). (authors)
Shared memory parallelism for 3D cartesian discrete ordinates solver

International Nuclear Information System (INIS)

Moustafa, S.; Dutka-Malen, I.; Plagne, L.; Poncot, A.; Ramet, P.

2013-01-01

This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multi-core + SIMD - Single Instruction on Multiple Data) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46*10 6 spatial cells and 1*10 12 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40.74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool. (authors)
Parallel Execution of Multi Set Constraint Rewrite Rules

DEFF Research Database (Denmark)

Sulzmann, Martin; Lam, Edmund Soon Lee

2008-01-01

that the underlying constraint rewrite implementation executes rewrite steps in parallel on increasingly popular becoming multi-core architectures. We design and implement efficient algorithms which allow for the parallel execution of multi-set constraint rewrite rules. Our experiments show that we obtain some......Multi-set constraint rewriting allows for a highly parallel computational model and has been used in a multitude of application domains such as constraint solving, agent specification etc. Rewriting steps can be applied simultaneously as long as they do not interfere with each other.We wish...
Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Directory of Open Access Journals (Sweden)

Wanrong Huang

2017-01-01

Full Text Available The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS, is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.
Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

Directory of Open Access Journals (Sweden)

Harald Michalik

2009-02-01

Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.
Ice core records of climate variability on the Third Pole with emphasis on the Guliya ice cap, western Kunlun Mountains

Science.gov (United States)

Thompson, Lonnie G.; Yao, Tandong; Davis, Mary E.; Mosley-Thompson, Ellen; Wu, Guangjian; Porter, Stacy E.; Xu, Baiqing; Lin, Ping-Nan; Wang, Ninglian; Beaudon, Emilie; Duan, Keqin; Sierra-Hernández, M. Roxana; Kenny, Donald V.

2018-05-01

Records of recent climate from ice cores drilled in 2015 on the Guliya ice cap in the western Kunlun Mountains of the Tibetan Plateau, which with the Himalaya comprises the Third Pole (TP), demonstrate that this region has become warmer and moister since at least the middle of the 19th century. Decadal-scale linkages are suggested between ice core temperature and snowfall proxies, North Atlantic oceanic and atmospheric processes, Arctic temperatures, and Indian summer monsoon intensity. Correlations between annual-scale oxygen isotopic ratios (δ18O) and tropical western Pacific and Indian Ocean sea surface temperatures are also demonstrated. Comparisons of climate records during the last millennium from ice cores acquired throughout the TP illustrate centennial-scale differences between monsoon and westerlies dominated regions. Among these records, Guliya shows the highest rate of warming since the end of the Little Ice Age, but δ18O data over the last millennium from TP ice cores support findings that elevation-dependent warming is most pronounced in the Himalaya. This, along with the decreasing precipitation rates in the Himalaya region, is having detrimental effects on the cryosphere. Although satellite monitoring of glaciers on the TP indicates changes in surface area, only a few have been directly monitored for mass balance and ablation from the surface. This type of ground-based study is essential to obtain a better understanding of the rate of ice shrinkage on the TP.
Current distribution characteristics of superconducting parallel circuits

International Nuclear Information System (INIS)

Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.

1994-01-01

In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process

Resolving climate change in the period 15-23 ka in Greenland ice cores: A new application of spectral trend analysis

NARCIS (Netherlands)

de Jong, M.G.G.; Nio, D.S.; Böhm, A.R.; Seijmonsbergen, H.C.; de Graaff, L.W.S.

2009-01-01

Northern Hemisphere climate history through and following the Last Glacial Maximum is recorded in detail in ice cores from Greenland. However, the period between Greenland Interstadials 1 and 2 (15-23 ka), i.e. the period of deglaciation following the last major glaciation, has been difficult to
WASCAL - West African Science Service Center on Climate Change and Adapted Land Use Regional Climate Simulations and Land-Atmosphere Simulations for West Africa at DKRZ and elsewhere

Science.gov (United States)

Hamann, Ilse; Arnault, Joel; Bliefernicht, Jan; Klein, Cornelia; Heinzeller, Dominikus; Kunstmann, Harald

2014-05-01

accompanied by the WASCAL Graduate Research Program on the West African Climate System. The GRP-WACS provides ten scholarships per year for West African PhD students with a duration of three years. Present and future WASCAL PhD students will constitute one important user group of the Linux cluster that will be installed at the Competence Center in Ouagadougou, Burkina Faso. Regional Land-Atmosphere Simulations A key research activity of the WASCAL Core Research Program is the analysis of interactions between the land surface and the atmosphere to investigate how land surface changes affect hydro-meteorological surface fluxes such as evapotranspiration. Since current land surface models of global and regional climate models neglect dominant lateral hydrological processes such as surface runoff, a novel land surface model is used, the NCAR Distributed Hydrological Modeling System (NDHMS). This model can be coupled to WRF (WRF-Hydro) to perform two-way coupled atmospheric-hydrological simulations for the watershed of interest. Hardware and network prerequisites include a HPC cluster, network switches, internal storage media, Internet connectivity of sufficient bandwidth. Competences needed are HPC, storage, and visualization systems optimized for climate research, parallelization and optimization of climate models and workflows, efficient management of highest data volumes.
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2004-01-01

Ice cores are the most direct, continuous, and high resolution archive for Late Quaternary paleoclimate reconstruction. Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate migration strategies for New Zealand. (author). 23 refs., 15 figs., 1 tab
Implementation of parallel processing in the basf2 framework for Belle II

International Nuclear Information System (INIS)

Itoh, Ryosuke; Lee, Soohyung; Katayama, N; Mineo, S; Moll, A; Kuhr, T; Heck, M

2012-01-01

Recent PC servers are equipped with multi-core CPUs and it is desired to utilize the full processing power of them for the data analysis in large scale HEP experiments. A software framework basf2 is being developed for the use in the Belle II experiment, a new generation B-factory experiment at KEK, and the parallel event processing to utilize the multi-core CPUs is in its design for the use in the massive data production. The details of the implementation of event parallel processing in the basf2 framework are discussed with the report of preliminary performance study in the realistic use on a 32 core PC server.
Incorporating Parallel Computing into the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

Science.gov (United States)

Larson, Jay W.

1998-01-01

Atmospheric data assimilation is a method of combining actual observations with model forecasts to produce a more accurate description of the earth system than the observations or forecast alone can provide. The output of data assimilation, sometimes called the analysis, are regular, gridded datasets of observed and unobserved variables. Analysis plays a key role in numerical weather prediction and is becoming increasingly important for climate research. These applications, and the need for timely validation of scientific enhancements to the data assimilation system pose computational demands that are best met by distributed parallel software. The mission of the NASA Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. The system used to create these datasets is the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The core components of the the GEOS DAS are: the GEOS General Circulation Model (GCM), the Physical-space Statistical Analysis System (PSAS), the Observer, the on-line Quality Control (QC) system, the Coupler (which feeds analysis increments back to the GCM), and an I/O package for processing the large amounts of data the system produces (which will be described in another presentation in this session). The discussion will center on the following issues: the computational complexity for the whole GEOS DAS, assessment of the performance of the individual elements of GEOS DAS, and parallelization strategy for some of the components of the system.
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

Directory of Open Access Journals (Sweden)

Mustafa Basthikodi

2016-04-01

Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
On efficiency of fire simulation realization: parallelization with greater number of computational meshes

Science.gov (United States)

Valasek, Lukas; Glasa, Jan

2017-12-01

Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
The new landscape of parallel computer architecture

International Nuclear Information System (INIS)

Shalf, John

2007-01-01

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models
The new landscape of parallel computer architecture

Energy Technology Data Exchange (ETDEWEB)

Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

2007-07-15

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Science.gov (United States)

Faraj, Ahmad [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2009-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. Ice core records provide an annual-scale, 'instrumental-quality' baseline of atmospheric temperature and circulation changes back many thousands of years. (author). 45 refs., 16 figs., 2 tabs.
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2009-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. Ice core records provide an annual-scale, 'instrumental-quality' baseline of atmospheric temperature and circulation changes back many thousands of years. (author). 27 refs., 18 figs., 2 tabs
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.A.N.

2012-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. Ice core records provide an annual-scale, 'instrumental-quality' baseline of atmospheric temperature and circulation changes back many thousands of years. (author). 28 refs., 20 figs., 1 tab.
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2008-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. Ice core records provide an annual-scale, 'instrumental-quality' baseline of atmospheric temperature and circulation changes back many thousands of years. (author). 27 refs., 18 figs., 2 tabs
Development of whole core thermal-hydraulic analysis program ACT. 3. Coupling core module with primary heat transport system module

International Nuclear Information System (INIS)

Ohtaka, Masahiko; Ohshima, Hiroyuki

1998-10-01

A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including inter-wrapper flow under various reactor operation conditions. In this work, the core module as a main part of the ACT developed last year, which simulates thermal-hydraulics in the subassemblies and the inter-subassembly gaps, was coupled with an one dimensional plant system thermal-hydraulic analysis code LEDHER to simulate transients in the primary heat transport system and to give appropriate boundary conditions to the core model. The effective algorithm to couple these two calculation modules was developed, which required minimum modification of them. In order to couple these two calculation modules on the computing system, parallel computing technique using PVM (Parallel Virtual Machine) programming environment was applied. The code system was applied to analyze an out-of-pile sodium experiment simulating core with 7 subassemblies under transient condition for code verification. It was confirmed that the analytical results show a similar tendency of experimental results. (author)
A task parallel implementation of fast multipole methods

KAUST Repository

Taura, Kenjiro

2012-11-01

This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM, experiences have almost exclusively been limited to formulation based on flat homogeneous parallel loops. FMM in fact contains operations that cannot be readily expressed in such conventional but restrictive models. We show that task parallelism, or parallel recursions in particular, allows us to parallelize all operations of FMM naturally and scalably. Moreover it allows us to parallelize a \\'\\'mutual interaction\\'\\' for force/potential evaluation, which is roughly twice as efficient as a more conventional, unidirectional force/potential evaluation. The net result is an open source FMM that is clearly among the fastest single node implementations, including those on GPUs; with a million particles on a 32 cores Sandy Bridge 2.20GHz node, it completes a single time step including tree construction and force/potential evaluation in 65 milliseconds. The study clearly showcases both programmability and performance benefits of flexible parallel constructs over more monolithic parallel loops. © 2012 IEEE.
Parallelization of a Monte Carlo particle transport simulation code

Science.gov (United States)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Out-of-order parallel discrete event simulation for electronic system-level design

CERN Document Server

Chen, Weiwei

2014-01-01

This book offers readers a set of new approaches and tools a set of tools and techniques for facing challenges in parallelization with design of embedded systems.? It provides an advanced parallel simulation infrastructure for efficient and effective system-level model validation and development so as to build better products in less time.? Since parallel discrete event simulation (PDES) has the potential to exploit the underlying parallel computational capability in today's multi-core simulation hosts, the author begins by reviewing the parallelization of discrete event simulation, identifyin
Parallel channel effects under BWR LOCA conditions

International Nuclear Information System (INIS)

Suzuki, H.; Hatamiya, S.; Murase, M.

1988-01-01

Due to parallel channel effects, different flow patterns such as liquid down-flow and gas up-flow appear simultaneously in fuel bundles of a BWR core during postulated LOCAs. Applying the parallel channel effects to the fuel bundle, water drain tubes with a restricted bottom end have been developed in order to mitigate counter-current flow limiting and to increase the falling water flow rate at the upper tie plate. The upper tie plate with water drain tubes is an especially effective means of increasing the safety margin of a reactor with narrow gaps between fuel rods and high steam velocity at the upper tie plate. The characteristics of the water drain tubes have been experimentally investigated using a small-scaled steam-water system simulating a BWR core. Then, their effect on the fuel cladding temperature was evaluated using the LOCA analysis program SAFER. (orig.)
10Be and δ2H in polar ice cores as a probe of the solar variability's influence on climate

International Nuclear Information System (INIS)

Raisbeck, G.M.; Yiou, F.; Jouzel, J.; Domaine Univ., 38 - St-Martin-d'Heres; Petit, J.R.

1990-01-01

By using the technique of accelerator mass spectrometry, it is now possible to measure detailed profiles of cosmogenic (cosmic ray produced) 10 Be in polar ice cores. Recent work has demonstrated that these profiles contain information on solar activity, via its influence on the intensity of galactic cosmic rays arriving in the Earth's atmosphere. It has been known for some time that, as a result of temperature-dependent fractionation effects, the stable isotope profiles δ 2 O and δ 2 H in polar ice cores contain palaeoclimate information. Thus by comparing the 10 Be and stable isotope profiles in the same ice core, one can test the influence of solar variability on climate, and this independent of possible uncertainties in the absolute chronology of the records. We present here the results of such a comparison for two Antarctic ice cores; one from the South Pole, covering the past ca. 1000 years, and one from Dome C, covering the past ca. 3000 years. (author)

Parallel computation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

1997-01-01

The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
A parallel solver for huge dense linear systems

Science.gov (United States)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system
Modelling the regional climate and isotopic composition of Svalbard precipitation using REMOiso: a comparison with available GNIP and ice core data

NARCIS (Netherlands)

Divine, D.V.; Sjolte, J.; Isaksson, E.; Meijer, H.A.J.; van de Wal, R.S.W.; Martma, T.; Pohjola, V.; Sturm, C.; Godtliebsen, F.

2011-01-01

Simulations of a regional (approx. 50 km resolution) circulation model REMOiso with embedded stable water isotope module covering the period 1958-2001 are compared with the two instrumental climate and four isotope series (δ18O) from western Svalbard. We examine the data from ice cores drilled on
Modelling the regional climate and isotopic composition of Svalbard precipitation using REMOiso : a comparison with available GNIP and ice core data

NARCIS (Netherlands)

Divine, D. V.; Sjolte, J.; Isaksson, E.; Meijer, H. A. J.; van de Wal, R. S. W.; Martma, T.; Pohjola, V.; Sturm, C.; Godtliebsen, F.

2011-01-01

Simulations of a regional (approx. 50 km resolution) circulation model REMOiso with embedded stable water isotope module covering the period 1958-2001 are compared with the two instrumental climate and four isotope series (d18O) from western Svalbard. We examine the data from ice cores drilled on
Insights Into Deglacial Through Holocene Climate Variability At The Peru-Chile Margin From Very High Sedimentation Rate Marine Cores

Science.gov (United States)

Chazen, C.; Dejong, H.; Altabet, M.; Herbert, T.

2007-12-01

The Peru-Chile upwelling system is situated at the epicenter of the modern ENSO System. The high settling flux of organic materials and poor ventilation of subsurface waters makes the Peru upwelling system one of the world's three major oxygen minimum/denitrification zones (Codispoti and Christensen, 1985). Extremely high sedimentation rates and permanent hypoxic/anoxic subsurface waters create excellent conditions for the preservation of organic matter. Despite the significance of this region in regards to paleoceanography and paleoclimatology, relatively little work has been done to characterize past Peruvian climate because carbonate dissolution hinders the use of conventional paleoclimate methods and hiatuses frequently interrupt the record. However, using nitrogen isotopes and alkenone paleothermometry on multiple sediment cores from the Margin we have managed to overcome many of these challenges to create a nearly continuous SST (Uk`37), productivity (C37total), biogenic opal and denitrification (δN15) record from the LGM through the late Holocene. Remarkably, recent work has revealed an annually laminated core, which spans from 1.4-8.0ka uninterrupted, providing a unique window into Holocene climate variability. Modern-day upwelling induced climate at the Peru-Chile margin is characterized by cold temperatures (21.5°C) high productivity and strong denitrification, which has persisted since the mid Holocene (4ka). The mid Holocene also marks the beginning of a dramatic increase in seasonality and ENSO variability consistent with other tropical climate indicators. Climate variability in the Mid-early Holocene shows a distinctively different pattern from that of the late Holocene; unproductive warm temperatures persist through the early Holocene in what can be described as a permanent El Niño-like state. Early tropical warming occurred near 17ka along with an unprecedented increase in denitrification, which is decoupled from local productivity. Early onset
NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors

Directory of Open Access Journals (Sweden)

Vanessa Vargas

2018-03-01

Full Text Available Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar. By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device’s resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

Energy Technology Data Exchange (ETDEWEB)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-09-30

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

International Nuclear Information System (INIS)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-01-01

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
7th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Nagel, Wolfgang; Resch, Michael

2014-01-01

Current advances in High Performance Computing (HPC) increasingly impact efficient software development workflows. Programmers for HPC applications need to consider trends such as increased core counts, multiple levels of parallelism, reduced memory per core, and I/O system challenges in order to derive well performing and highly scalable codes. At the same time, the increasing complexity adds further sources of program defects. While novel programming paradigms and advanced system libraries provide solutions for some of these challenges, appropriate supporting tools are indispensable. Such tools aid application developers in debugging, performance analysis, or code optimization and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 7th International Parallel Tools Workshop, held in Dresden, Germany, September 3-4, 2013.
Hominin Sites and Paleolakes Drilling Project: A 500,000-year climate record from Chew Bahir, a key site in southern Ethiopia

Science.gov (United States)

Foerster, Verena E.; Asrat, Asfawossen; Chapot, Melissa S.; Cohen, Andrew S.; Dean, Jonathan R.; Deino, Alan; Günter, Christina; Junginger, Annett; Lamb, Henry F.; Leng, Melanie J.; Roberts, Helen M.; Schaebitz, Frank; Trauth, Martin H.

2017-04-01

What is the environmental context of human evolution and dispersal? In order to evaluate the impact that different timescales and magnitude of climatic shifts have had on the living conditions of anatomically modern humans, the Hominin Sites and Paleolakes Drilling Project (HSPDP) has cored five predominantly-lacustrine sequences to investigate climate change in East Africa (Cohen et al., 2016). The five high-priority areas in Ethiopia and Kenya are located in close proximity to key paleoanthropological sites covering various steps in evolution. One of the five cores is from Chew Bahir. Chew Bahir is a deep tectonically-bound basin in the southern Ethiopian rift, close to the Lower Omo valley, site of the earliest known fossil of anatomically modern humans. As part of the deep drilling initiative between ICDP-HSPDP and the Collaborative Research Center (CRC806), the Chew Bahir sedimentary deposits were cored in late 2014, yielding in two parallel cores reaching 280 m depth and which cover the last 550 ka of environmental history. We present the initial results of on-going lithologic and stratigraphic investigation of the composite core, the results of high resolution MSCL and XRF scanning data, as well as the first results of detailed multi-proxy analysis of the Chew Bahir cores. These analyses are based on more than 14,000 discrete subsamples. An initial chronology, based on Ar/Ar and OSL dating, allows the first reconstructions of dry-wet cycles during the last 550 ka. Both geochemical and sedimentological results show that the Chew Bahir deposits are sensitive recorders of changes in moisture, sediment influx, provenance, transport and diagenetic processes. The core records will allow tests of the various hypotheses regarding the impact of climate variability -from climate flickers to orbital driven transitions- on the evolution and dispersal of anatomically modern humans. References: Cohen, A. et al., 2016. The Hominin Sites and Paleolakes Drilling Project
Parallelization of the Physical-Space Statistical Analysis System (PSAS)

Science.gov (United States)

Larson, J. W.; Guo, J.; Lyster, P. M.

1999-01-01

Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational
Massive Asynchronous Parallelization of Sparse Matrix Factorizations

Energy Technology Data Exchange (ETDEWEB)

Chow, Edmond [Georgia Inst. of Technology, Atlanta, GA (United States)

2018-01-08

Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
Parallel simulation of tsunami inundation on a large-scale supercomputer

Science.gov (United States)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Core fluctuations and current profile dynamics in the MST reversed-field pinch

International Nuclear Information System (INIS)

Brower, D.L.; Ding, W.X.; Lei, J.

2003-01-01

First measurements of the current density profile, magnetic field fluctuations and electrostatic (e.s.) particle flux in the core of a high-temperature reversed-field pinch (RFP) are presented. We report three new results: (1) The current density peaks during the slow ramp phase of the sawtooth cycle and flattens promptly at the crash. Profile flattening can be linked to magnetic relaxation and the dynamo which is predicted to drive anti-parallel current in the core. Measured core magnetic fluctuations are observed to increases four-fold at the crash. Between sawtooth crashes, measurements indicate the particle flux driven by e.s. fluctuations is too small to account for the total radial particle flux. (2) Core magnetic fluctuations are observed to decrease at least twofold in plasmas where energy confinement time improves ten-fold. In this case, the radial particle flux is also reduced, suggesting core e.s. fluctuation-induced transport may play role in confinement. (3) The parallel current density increases in the outer region of the plasma during high confinement, as expected, due to the applied edge parallel electric field. However, the core current density also increases due to dynamo reduction and the emergence of runaway electrons. (author)
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

Science.gov (United States)

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Running climate model on a commercial cloud computing environment: A case study using Community Earth System Model (CESM) on Amazon AWS

Science.gov (United States)

Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock

2017-01-01

The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
Feasibility Study of Core Design with a Monte Carlo Code for APR1400 Initial core

Energy Technology Data Exchange (ETDEWEB)

Kim, Jinsun; Chang, Do Ik; Seong, Kibong [KEPCO NF, Daejeon (Korea, Republic of)

2014-10-15

The Monte Carlo calculation becomes more popular and useful nowadays due to the rapid progress in computing power and parallel calculation techniques. There have been many attempts to analyze a commercial core by Monte Carlo transport code using the enhanced computer capability, recently. In this paper, Monte Carlo calculation of APR1400 initial core has been performed and the results are compared with the calculation results of conventional deterministic code to find out the feasibility of core design using Monte Carlo code. SERPENT, a 3D continuous-energy Monte Carlo reactor physics burnup calculation code is used for this purpose and the KARMA-ASTRA code system, which is used for a deterministic code of comparison. The preliminary investigation for the feasibility of commercial core design with Monte Carlo code was performed in this study. Simplified core geometry modeling was performed for the reactor core surroundings and reactor coolant model is based on two region model. The reactivity difference at HZP ARO condition between Monte Carlo code and the deterministic code is consistent with each other and the reactivity difference during the depletion could be reduced by adopting the realistic moderator temperature. The reactivity difference calculated at HFP, BOC, ARO equilibrium condition was 180 ±9 pcm, with axial moderator temperature of a deterministic code. The computing time will be a significant burden at this time for the application of Monte Carlo code to the commercial core design even with the application of parallel computing because numerous core simulations are required for actual loading pattern search. One of the remedy will be a combination of Monte Carlo code and the deterministic code to generate the physics data. The comparison of physics parameters with sophisticated moderator temperature modeling and depletion will be performed for a further study.
Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) Final Report

Energy Technology Data Exchange (ETDEWEB)

Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2014-05-19

A partnership across government, academic, and private sectors has created a novel system that enables climate researchers to solve current and emerging data analysis and visualization challenges. The Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) software project utilizes the Python application programming interface (API) combined with C/C++/Fortran implementations for performance-critical software that offers the best compromise between "scalability" and “ease-of-use.” The UV-CDAT system is highly extensible and customizable for high-performance interactive and batch visualization and analysis for climate science and other disciplines of geosciences. For complex, climate data-intensive computing, UV-CDAT’s inclusive framework supports Message Passing Interface (MPI) parallelism as well as taskfarming and other forms of parallelism. More specifically, the UV-CDAT framework supports the execution of Python scripts running in parallel using the MPI executable commands and leverages Department of Energy (DOE)-funded general-purpose, scalable parallel visualization tools such as ParaView and VisIt. This is the first system to be successfully designed in this way and with these features. The climate community leverages these tools and others, in support of a parallel client-server paradigm, allowing extreme-scale, server-side computing for maximum possible speed-up.
Parallel file system performances in fusion data storage

International Nuclear Information System (INIS)

Iannone, F.; Podda, S.; Bracco, G.; Manduchi, G.; Maslennikov, A.; Migliori, S.; Wolkersdorfer, K.

2012-01-01

High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing–For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling – Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.
Parallel file system performances in fusion data storage

Energy Technology Data Exchange (ETDEWEB)

Iannone, F., E-mail: francesco.iannone@enea.it [Associazione EURATOM-ENEA sulla Fusione, C.R.ENEA Frascati, via E.Fermi, 45 - 00044 Frascati, Rome (Italy); Podda, S.; Bracco, G. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Manduchi, G. [Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Corso Stati Uniti, 4 - 35127 Padua (Italy); Maslennikov, A. [CASPUR Inter-University Consortium for the Application of Super-Computing for Research, via dei Tizii, 6b - 00185 Rome (Italy); Migliori, S. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Wolkersdorfer, K. [Juelich Supercomputing Centre-FZJ, D-52425 Juelich (Germany)

2012-12-15

High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing-For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling - Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.

High-performance whole core Pin-by-Pin calculation based on EFEN-SP_3 method

International Nuclear Information System (INIS)

Yang Wen; Zheng Youqi; Wu Hongchun; Cao Liangzhi; Li Yunzhao

2014-01-01

The EFEN code for high-performance PWR whole core pin-by-pin calculation based on the EFEN-SP_3 method can be achieved by employing spatial parallelization based on MPI. To take advantage of the advanced computing and storage power, the entire problem spatial domain can be appropriately decomposed into sub-domains and the assigned to parallel CPUs to balance the computing load and minimize communication cost. Meanwhile, Red-Black Gauss-Seidel nodal sweeping scheme is employed to avoid the within-group iteration deterioration due to spatial parallelization. Numerical results based on whole core pin-by-pin problems designed according to commercial PWRs demonstrate the following conclusions: The EFEN code can provide results with acceptable accuracy; Communication period impacts neither the accuracy nor the parallel efficiency; Domain decomposition methods with smaller surface to volume ratio leads to greater parallel efficiency; A PWR whole core pin-by-pin calculation with a spatial mesh 289 × 289 × 218 and 4 energy groups could be completed about 900 s by using 125 CPUs, and its parallel efficiency is maintained at about 90%. (authors)
Climate Proxies: An Inquiry-Based Approach to Discovering Climate Change on Antarctica

Science.gov (United States)

Wishart, D. N.

2016-12-01

An attractive way to advance climate literacy in higher education is to emphasize its relevance while teaching climate change across the curriculum to science majors and non-science majors. An inquiry-based pedagogical approach was used to engage five groups of students on a "Polar Discovery Project" aimed at interpreting the paleoclimate history of ice cores from Antarctica. Learning objectives and student learning outcomes were clearly defined. Students were assigned several exercises ranging from examination of Antarctic topography to the application of physical and chemical measurements as proxies for climate change. Required materials included base and topographic maps of Antarctica; graph sheets for construction of topographic cross-sectional profiles from profile lines of the Western Antarctica Ice Sheet (WAIS) Divide and East Antarctica; high-resolution photographs of Antarctic ice cores; stratigraphic columns of ice cores; borehole and glaciochemical data (i.e. anions, actions, δ18O, δD etc.); and isotope data on greenhouse gases (CH4, O2, N2) extracted from gas bubbles in ice cores. The methodology was to engage students in (2) construction of topographic profiles; (2) suggest directions for ice flow based on simple physics; (3) formulate decisions on suitable locations for drilling ice cores; (4) visual ice stratigraphy including ice layer counting; (5) observation of any insoluble particles (i.e. meteoritic and volcanic material); (6) analysis of borehole temperature profiles; and (7) the interpretation of several datasets to derive a paleoclimate history of these areas of the continent. The overall goal of the project was to improve the students analytical and quantitative skills; their ability to evaluate relationships between physical and chemical properties in ice cores, and to advance the understanding the impending consequences of climate change while engaging science, technology, engineering and mathematics (STEM). Student learning outcomes
Core Hunter 3: flexible core subset selection.

Science.gov (United States)

De Beukelaer, Herman; Davenport, Guy F; Fack, Veerle

2018-05-31

Core collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness. In version 3 of Core Hunter (CH3) we have incorporated two new, improved methods for summarizing distances to quantify diversity or representativeness of the core collection. A comparison of CH3 and Core Hunter 2 (CH2) showed that these new metrics can be effectively optimized with less complex algorithms, as compared to those used in CH2. CH3 is more effective at maximizing the improved diversity metric than CH2, still ensures a high average and minimum distance, and is faster for large datasets. Using CH3, a simple stochastic hill-climber is able to find highly diverse core collections, and the more advanced parallel tempering algorithm further increases the quality of the core and further reduces variability across independent samples. We also evaluate the ability of CH3 to simultaneously maximize diversity, and either representativeness or allelic richness, and compare the results with those of the GDOpt and SimEli methods. CH3 can sample equally representative cores as GDOpt, which was specifically designed for this purpose, and is able to construct cores that are simultaneously more diverse, and either are more representative or have higher allelic richness, than those obtained by SimEli. In version 3, Core Hunter has been updated to include two new core subset selection metrics that construct cores for representativeness or diversity, with improved performance. It combines and outperforms the
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

Science.gov (United States)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors

DEFF Research Database (Denmark)

Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael

2008-01-01

about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks....... In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs....
A Parallel Butterfly Algorithm

KAUST Repository

Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

2014-01-01

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
A Parallel Butterfly Algorithm

KAUST Repository

Poulson, Jack

2014-02-04

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
Hardware-Oblivious Parallelism for In-Memory Column-Stores

NARCIS (Netherlands)

M. Heimel; M. Saecker; H. Pirk (Holger); S. Manegold (Stefan); V. Markl

2013-01-01

htmlabstractThe multi-core architectures of today’s computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone
An overview of European efforts in generating climate data records

NARCIS (Netherlands)

Su, Z.; Timmermans, W.J.; Zeng, Y.; Schulz, J.; John, V.O.; Roebeling, R.A.; Poli, P.; Tan, D.; Kaspar, F.; Kaiser-Weiss, A.; Swinnen, E.; Tote, C.; Gregow, H.; Manninen, T.; Riihela, A.; Calvet, J.C.; Ma, Yaoming; Wen, Jun

2018-01-01

The Coordinating Earth Observation Data Validation for Reanalysis for Climate Services project (CORE-CLIMAX) aimed to substantiate how Copernicus observations and products can contribute to climate change analyses. CORE-CLIMAX assessed the European capability to provide climate data records (CDRs)
Efficient sequential and parallel algorithms for record linkage.

Science.gov (United States)

Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

2014-01-01

Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures

KAUST Repository

Al Farhan, Mohammed Ahmed; Keyes, David E.

2018-01-01

involving thread and data-level parallelism. Our approach is based upon a multi-level hierarchical distribution of work and data across both the threads and the SIMD units within every hardware core. On a 64-core KNL chip, we achieve nearly 2.9x speedup
I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

Science.gov (United States)

Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.

1998-01-01

The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.
Parallelized Seeded Region Growing Using CUDA

Directory of Open Access Journals (Sweden)

Seongjin Park

2014-01-01

Full Text Available This paper presents a novel method for parallelizing the seeded region growing (SRG algorithm using Compute Unified Device Architecture (CUDA technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.
Scariest thing about climate change: climate flips

International Nuclear Information System (INIS)

Beaulieu, P.

1997-01-01

The idea that an increase in greenhouse gases will cause the global average temperature to rise slowly over the next decades was discussed. Studies of ice core from Greenland have shown that in the past climate shifts seem to have happened quickly. Some scientists fear that increasingly frequent extreme weather events could be a sign that the climate system is nearing its threshold and a rapid climate flip may be just ahead. In the case of global climatic system, the danger is that stresses from greenhouse gas effects are pushing the present system over the threshold where it must flip into a new warmer system that will be stable, but different from the climate on which our agriculture, economy, settlements and lives depend. 4 refs
Development of massively parallel quantum chemistry program SMASH

International Nuclear Information System (INIS)

Ishimura, Kazuya

2015-01-01

A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer
A privacy-preserving parallel and homomorphic encryption scheme

Directory of Open Access Journals (Sweden)

Min Zhaoe

2017-04-01

Full Text Available In order to protect data privacy whilst allowing efficient access to data in multi-nodes cloud environments, a parallel homomorphic encryption (PHE scheme is proposed based on the additive homomorphism of the Paillier encryption algorithm. In this paper we propose a PHE algorithm, in which plaintext is divided into several blocks and blocks are encrypted with a parallel mode. Experiment results demonstrate that the encryption algorithm can reach a speed-up ratio at about 7.1 in the MapReduce environment with 16 cores and 4 nodes.
Parallelization Experience with Four Canonical Econometric Models Using ParMitISEM

Directory of Open Access Journals (Sweden)

Nalan Baştürk

2016-03-01

Full Text Available This paper presents the parallel computing implementation of the MitISEM algorithm, labeled Parallel MitISEM. The basic MitISEM algorithm provides an automatic and flexible method to approximate a non-elliptical target density using adaptive mixtures of Student-t densities, where only a kernel of the target density is required. The approximation can be used as a candidate density in Importance Sampling or Metropolis Hastings methods for Bayesian inference on model parameters and probabilities. We present and discuss four canonical econometric models using a Graphics Processing Unit and a multi-core Central Processing Unit version of the MitISEM algorithm. The results show that the parallelization of the MitISEM algorithm on Graphics Processing Units and multi-core Central Processing Units is straightforward and fast to program using MATLAB. Moreover the speed performance of the Graphics Processing Unit version is much higher than the Central Processing Unit one.
Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

Science.gov (United States)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
The Parallel System for Integrating Impact Models and Sectors (pSIMS)

Science.gov (United States)

Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

2014-01-01

We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
QR-decomposition based SENSE reconstruction using parallel architecture.

Science.gov (United States)

Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad

2018-04-01

Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.

Three dimensional Burn-up program parallelization using socket programming

International Nuclear Information System (INIS)

Haliyati R, Evi; Su'ud, Zaki

2002-01-01

A computer parallelization process was built with a purpose to decrease execution time of a physics program. In this case, a multi computer system was built to be used to analyze burn-up process of a nuclear reactor. This multi computer system was design need using a protocol communication among sockets, i.e. TCP/IP. This system consists of computer as a server and the rest as clients. The server has a main control to all its clients. The server also divides the reactor core geometrically to in parts in accordance with the number of clients, each computer including the server has a task to conduct burn-up analysis of 1/n part of the total reactor core measure. This burn-up analysis was conducted simultaneously and in a parallel way by all computers, so a faster program execution time was achieved close to 1/n times that of one computer. Then an analysis was carried out and states that in order to calculate the density of atoms in a reactor of 91 cm x 91 cm x 116 cm, the usage of a parallel system of 2 computers has the highest efficiency
Parallel optimization of IDW interpolation algorithm on multicore platform

Science.gov (United States)

Guan, Xuefeng; Wu, Huayi

2009-10-01

Due to increasing power consumption, heat dissipation, and other physical issues, the architecture of central processing unit (CPU) has been turning to multicore rapidly in recent years. Multicore processor is packaged with multiple processor cores in the same chip, which not only offers increased performance, but also presents significant challenges to application developers. As a matter of fact, in GIS field most of current GIS algorithms were implemented serially and could not best exploit the parallelism potential on such multicore platforms. In this paper, we choose Inverse Distance Weighted spatial interpolation algorithm (IDW) as an example to study how to optimize current serial GIS algorithms on multicore platform in order to maximize performance speedup. With the help of OpenMP, threading methodology is introduced to split and share the whole interpolation work among processor cores. After parallel optimization, execution time of interpolation algorithm is greatly reduced and good performance speedup is achieved. For example, performance speedup on Intel Xeon 5310 is 1.943 with 2 execution threads and 3.695 with 4 execution threads respectively. An additional output comparison between pre-optimization and post-optimization is carried out and shows that parallel optimization does to affect final interpolation result.
Multiscale Architectures and Parallel Algorithms for Video Object Tracking

Science.gov (United States)

2011-10-01

larger number of cores using the IBM QS22 Blade for handling higher video processing workloads (but at higher cost per core), low power consumption and...Cell/B.E. Blade processors which have a lot more main memory but also higher power consumption . More detailed performance figures for HD and SD video...Parallelism in Algorithms and Architectures, pages 289–298, 2007. [3] S. Ali and M. Shah. COCOA - Tracking in aerial imagery. In Daniel J. Henry
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

International Nuclear Information System (INIS)

Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.

2013-01-01

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N ∼ 10 7 particles. Our code is based on the Hénon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10 5 to 10 7 . We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within ∼ 5 , 128 for N = 10 6 and 256 for N = 10 7 . The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60×, 100×, and 220×, respectively.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

Directory of Open Access Journals (Sweden)

Hari Radhakrishnan

2015-01-01

Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method

Science.gov (United States)

Bi, Xunqiang

1997-12-01

In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
Modular high-temperature gas-cooled reactor simulation using parallel processors

International Nuclear Information System (INIS)

Ball, S.J.; Conklin, J.C.

1989-01-01

The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulations routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (operator) interaction. In this paper the benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses
Scaling up machine learning: parallel and distributed approaches

National Research Council Canada - National Science Library

Bekkerman, Ron; Bilenko, Mikhail; Langford, John

2012-01-01

... presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters; concurrent programming frameworks that include CUDA, MPI, MapReduce, and DryadLINQ; and various learning settings: supervised, unsupervised, semi-supervised, and online learning. Extensive coverage of parallelizat...
Models of parallel computation :a survey and classification

Institute of Scientific and Technical Information of China (English)

ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

2007-01-01

In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.
Design Patterns: establishing a discipline of parallel software engineering

CERN Multimedia

CERN. Geneva

2010-01-01

Many core processors present us with a software challenge. We must turn our serial code into parallel code. To accomplish this wholesale transformation of our software ecosystem, we must define established practice is in parallel programming and then develop tools to support that practice. This leads to design patterns supported by frameworks optimized at runtime with advanced autotuning compilers. In this talk I provide an update of my ongoing research with the ParLab at UC Berkeley to realize this vision. In particular, I will describe our draft parallel pattern language, our early experiments with software frameworks, and the associated runtime optimization tools.About the speakerTim Mattson is a parallel programmer (Ph.D. Chemistry, UCSC, 1985). He does linear algebra, finds oil, shakes molecules, solves differential equations, and models electrons in simple atomic systems. He has spent his career working with computer scientists to make sure the needs of parallel applications programmers are met.Tim has ...
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot

2010-01-01

-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster
Parallel family trees for transfer matrices in the Potts model

Science.gov (United States)

Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

2015-02-01

The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster
Development of massively parallel quantum chemistry program SMASH

Energy Technology Data Exchange (ETDEWEB)

Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)

2015-12-31

A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Parallel algorithms for testing finite state machines:Generating UIO sequences

OpenAIRE

Hierons, RM; Turker, UC

2016-01-01

This paper describes an efficient parallel algorithm that uses many-core GPUs for automatically deriving Unique Input Output sequences (UIOs) from Finite State Machines. The proposed algorithm uses the global scope of the GPU's global memory through coalesced memory access and minimises the transfer between CPU and GPU memory. The results of experiments indicate that the proposed method yields considerably better results compared to a single core UIO construction algorithm. Our algorithm is s...
Core baffle for nuclear reactors

International Nuclear Information System (INIS)

Machado, O.J.; Berringer, R.T.

1977-01-01

The invention concerns the design of the core of a LWR with a large number of fuel assemblies formed by fuel rods and kept in position by spacer grids. According to the invention, at the level of the spacer grids match plates are mounted with openings so the flow of coolant directed upwards will not be obstructed and a parallel bypass will be obtained in the space between the core barrel and the baffle plates. In case of an accident, this configuration reduces or avoids damage from overpressure reactions. (HP) [de
Asynchronous Task-Based Parallelization of Algebraic Multigrid

KAUST Repository

AlOnazi, Amani A.

2017-06-23

As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today\\'s petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous.We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.
SCORPIO - VVER core surveillance system

International Nuclear Information System (INIS)

Zalesky, K.; Svarny, J.; Novak, L.; Rosol, J.; Horanes, A.

1997-01-01

The Halden Project has developed the core surveillance system SCORPIO which has two parallel modes of operation: the Core Follow Mode and the Predictive Mode. The main motivation behind the development of SCORPIO is to make a practical tool for reactor operators which can increase the quality and quantity of information presented on core status and dynamic behavior. This can first of all improve plant safety as undesired core conditions are detected and prevented. Secondly, more flexible and efficient plant operation is made possible. So far the system has only been implemented on western PWRs but the basic concept is applicable to a wide range of reactor including WWERs. The main differences between WWERs and typical western PWRs with respect to core surveillance requirements are outlined. The development of a WWER version of SCORPIO was initiated in cooperation with the Nuclear Research Institute at Rez and industry partners in the Czech Republic. The first system will be installed at the Dukovany NPP. (author)
Climatic risks

International Nuclear Information System (INIS)

Lamarre, D.; Favier, R.; Bourg, D.; Marchand, J.P.

2005-04-01

The climatic risks are analyzed in this book under the cross-vision of specialists of different domains: philosophy, sociology, economic history, law, geography, climatology and hydrology. The prevention of risks and the precautionary principle are presented first. Then, the relations between climatic risk and geography are analyzed using the notion of territoriality. The territory aspect is in the core of the present day debates about the geography of risks, in particular when the links between climate change and public health are considered. Then the main climatic risks are presented. Droughts and floods are the most damaging ones and the difficulties of prevention-indemnification coupling remain important. (J.S.)
Performance of a fine-grained parallel model for multi-group nodal-transport calculations in three-dimensional pin-by-pin reactor geometry

International Nuclear Information System (INIS)

Masahiro, Tatsumi; Akio, Yamamoto

2003-01-01

A production code SCOPE2 was developed based on the fine-grained parallel algorithm by the red/black iterative method targeting parallel computing environments such as a PC-cluster. It can perform a depletion calculation in a few hours using a PC-cluster with the model based on a 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry for in-core fuel management of commercial PWRs. The present algorithm guarantees the identical convergence process as that in serial execution, which is very important from the viewpoint of quality management. The fine-mesh geometry is constructed by hierarchical decomposition with introduction of intermediate management layer as a block that is a quarter piece of a fuel assembly in radial direction. A combination of a mesh division scheme forcing even meshes on each edge and a latency-hidden communication algorithm provided simplicity and efficiency to message passing to enhance parallel performance. Inter-processor communication and parallel I/O access were realized using the MPI functions. Parallel performance was measured for depletion calculations by the 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry with 340 x 340 x 26 meshes for full core geometry and 170 x 170 x 26 for quarter core geometry. A PC cluster that consists of 24 Pentium-4 processors connected by the Fast Ethernet was used for the performance measurement. Calculations in full core geometry gave better speedups compared to those in quarter core geometry because of larger granularity. Fine-mesh sweep and feedback calculation parts gave almost perfect scalability since granularity is large enough, while 1-group coarse-mesh diffusion acceleration gave only around 80%. The speedup and parallel efficiency for total computation time were 22.6 and 94%, respectively, for the calculation in full core geometry with 24 processors. (authors)
Performance evaluations of advanced massively parallel platforms based on gyrokinetic toroidal five-dimensional Eulerian code GT5D

International Nuclear Information System (INIS)

Idomura, Yasuhiro; Jolliet, Sebastien

2010-01-01

A gyrokinetic toroidal five dimensional Eulerian code GT5D is ported on six advanced massively parallel platforms and comprehensive benchmark tests are performed. A parallelisation technique based on physical properties of the gyrokinetic equation is presented. By extending the parallelisation technique with a hybrid parallel model, the scalability of the code is improved on platforms with multi-core processors. In the benchmark tests, a good salability is confirmed up to several thousands cores on every platforms, and the maximum sustained performance of ∼18.6 Tflops is achieved using 16384 cores of BX900. (author)

An Efficient Parallel SAT Solver Exploiting Multi-Core Environments, Phase II

Data.gov (United States)

National Aeronautics and Space Administration — The hundreds of stream cores in the latest graphics processors (GPUs), and the possibility to execute non-graphics computations on them, open unprecedented levels of...
The boat hull model : enabling performance prediction for parallel computing prior to code development

NARCIS (Netherlands)

Nugteren, C.; Corporaal, H.

2012-01-01

Multi-core and many-core were already major trends for the past six years and are expected to continue for the next decade. With these trends of parallel computing, it becomes increasingly difficult to decide on which processor to run a given application, mainly because the programming of these
Electromagnetic Physics Models for Parallel Computing Architectures

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.A.N.

2014-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Ice core records provide an annual-scale, 'instrumental-quality' baseline of atmospheric temperature and circulation changes back many thousands of years. (author)
Traditional Tracking with Kalman Filter on Parallel Architectures

Science.gov (United States)

Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; MacNeill, Ian; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2015-05-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this, we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The most common track finding techniques in use today are however those based on the Kalman Filter. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. We report the results of our investigations into the potential and limitations of these algorithms on the new parallel hardware.
Fire, Climate, and Human Activity: A Combustive Combination

Science.gov (United States)

Kehrwald, N. M.; Battistel, D.; Argiriadis, E.; Barbante, C.; Barber, L. B.; Fortner, S. K.; Jasmann, J.; Kirchgeorg, T.; Zennaro, P.

2017-12-01

Ice and lake core records demonstrate that fires caused by human activity can dominate regional biomass burning records in the Common Era. These major increases in fires are often associated with extensive land use change such as an expansion in agriculture. Regions with few humans, relatively stable human populations and/or unvarying land use often have fire histories that are dominated by climate parameters such as temperature and precipitation. Here, we examine biomass burning recorded in ice cores from northern Greenland (NEEM, (77°27'N; 51°3.6'W), Alaska (Juneau Icefield, 58° 35' N; 134° 29'W) and East Antarctica (EPICA DOME C; 75°06'S; 123°21'E), along with New Zealand lake cores to investigate interactions between climate, fire and human activity. Biomarkers such as levoglucosan, and its isomers mannosan and galactosan, can only be produced by cellulose combustion and therefore are specific indicators of past fire activity archived in ice and lake cores. These fire histories add another factor to climate proxies from the same core, and provide a comparison to regional fire syntheses from charcoal records and climate models. For example, fire data from the JSBACH-Spitfire model for the past 2000 years demonstrates that a climate-only scenario would not increase biomass burning in high northern latitudes for the past 2000 years, while NEEM ice core and regional pollen records demonstrate both increased fire activity and land use change that may be ascribed to human activity. Additional biomarkers such as fecal sterols in lake sediments can determine when people were in an area, and can help establish if an increased human presence in an area corresponds with intensified fire activity. This combination of specific biomarkers, other proxy data, and model output can help determine the relative impact of humans versus climate factors on regional fire activity.
Compiling the functional data-parallel language SaC for Microgrids of Self-Adaptive Virtual Processors

NARCIS (Netherlands)

Grelck, C.; Herhut, S.; Jesshope, C.; Joslin, C.; Lankamp, M.; Scholz, S.-B.; Shafarenko, A.

2009-01-01

We present preliminary results from compiling the high-level, functional and data-parallel programming language SaC into a novel multi-core design: Microgrids of Self-Adaptive Virtual Processors (SVPs). The side-effect free nature of SaC in conjunction with its data-parallel foundation make it an
Parallel computing solution of Boltzmann neutron transport equation

International Nuclear Information System (INIS)

Ansah-Narh, T.

2010-01-01

The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
Cache-aware data structure model for parallelism and dynamic load balancing

International Nuclear Information System (INIS)

Sridi, Marwa

2016-01-01

This PhD thesis is dedicated to the implementation of innovative parallel methods in the framework of fast transient fluid-structure dynamics. It improves existing methods within EUROPLEXUS software, in order to optimize the shared memory parallel strategy, complementary to the original distributed memory approach, brought together into a global hybrid strategy for clusters of multi-core nodes. Starting from a sound analysis of the state of the art concerning data structuring techniques correlated to the hierarchic memory organization of current multi-processor architectures, the proposed work introduces an approach suitable for an explicit time integration (i.e. with no linear system to solve at each step). A data structure of type 'Structure of arrays' is conserved for the global data storage, providing flexibility and efficiency for current operations on kinematics fields (displacement, velocity and acceleration). On the contrary, in the particular case of elementary operations (for internal forces generic computations, as well as fluxes computations between cell faces for fluid models), particularly time consuming but localized in the program, a temporary data structure of type 'Array of structures' is used instead, to force an efficient filling of the cache memory and increase the performance of the resolution, for both serial and shared memory parallel processing. Switching from the global structure to the temporary one is based on a cell grouping strategy, following classing cache-blocking principles but handling specifically for this work neighboring data necessary to the efficient treatment of ALE fluxes for cells on the group boundaries. The proposed approach is extensively tested, from the point of views of both the computation time and the access failures into cache memory, confronting the gains obtained within the elementary operations to the potential overhead generated by the data structure switch. Obtained results are very satisfactory, especially
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

Energy Technology Data Exchange (ETDEWEB)

Amadio, G.; et al.

2017-11-22

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2006-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. (author). 27 refs., 18 figs., 2 tabs
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2005-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. (author). 27 refs., 18 figs., 3 tabs
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.

2007-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Their study helps us to improve our understanding of regional patterns of climate behaviour in Antarctica and its influence on New Zealand, leading to more realistic regional climate models. Such models are needed to sensibly interpret current Antarctic and New Zealand climate variability and for the development of appropriate mitigation strategies for New Zealand. (author). 27 refs., 18 figs., 2 tabs
Integrated Current Balancing Transformer for Primary Parallel Isolated Boost Converter

DEFF Research Database (Denmark)

Sen, Gökhan; Ouyang, Ziwei; Thomsen, Ole Cornelius

2011-01-01

A simple, PCB compatible integrated solution is proposed for the current balancing requirement of the primary parallel isolated boost converter (PPIBC). Input inductor and the current balancing transformer are merged into the same core, which reduces the number of components allowing a cheaper...
Stable isotope analysis in ice core paleoclimatology

International Nuclear Information System (INIS)

Bertler, N.A.N.

2015-01-01

Ice cores from New Zealand and the Antarctic margin provide an excellent means of addressing the lack of longer-term climate observations in the Southern Hemisphere with near instrumental quality. Ice core records provide an annual-scale, 'instrumental-quality' baseline of atmospheric temperature and circulation changes back many thousands of years. (author).
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sarje, Abhinav [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jacobsen, Douglas W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ringler, Todd [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2016-05-01

The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
Efficient Execution of Video Applications on Heterogeneous Multi- and Many-Core Processors

NARCIS (Netherlands)

Pereira de Azevedo Filho, A.

2011-01-01

In this dissertation we present methodologies and evaluations aiming at increasing the efficiency of video coding applications for heterogeneous many-core processors composed of SIMD-only, scratchpad memory based cores. Our contributions are spread in three different fronts: thread-level parallelism
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Parallel DC3 Algorithm for Suffix Array Construction on Many-Core Accelerators

KAUST Repository

Liao, Gang

2015-05-01

In bioinformatics applications, suffix arrays are widely used to DNA sequence alignments in the initial exact match phase of heuristic algorithms. With the exponential growth and availability of data, using many-core accelerators, like GPUs, to optimize existing algorithms is very common. We present a new implementation of suffix array on GPU. As a result, suffix array construction on GPU achieves around 10x speedup on standard large data sets, which contain more than 100 million characters. The idea is simple, fast and scalable that can be easily scale to multi-core processors and even heterogeneous architectures. © 2015 IEEE.
Parallel DC3 Algorithm for Suffix Array Construction on Many-Core Accelerators

KAUST Repository

Liao, Gang; Ma, Longfei; Zang, Guangming; Tang, Lin

2015-01-01

In bioinformatics applications, suffix arrays are widely used to DNA sequence alignments in the initial exact match phase of heuristic algorithms. With the exponential growth and availability of data, using many-core accelerators, like GPUs, to optimize existing algorithms is very common. We present a new implementation of suffix array on GPU. As a result, suffix array construction on GPU achieves around 10x speedup on standard large data sets, which contain more than 100 million characters. The idea is simple, fast and scalable that can be easily scale to multi-core processors and even heterogeneous architectures. © 2015 IEEE.

Rapid changes in ice core gas records - Part 1: On the accuracy of methane synchronisation of ice cores

Science.gov (United States)

Köhler, P.

2010-08-01

Methane synchronisation is a concept to align ice core records during rapid climate changes of the Dansgaard/Oeschger (D/O) events onto a common age scale. However, atmospheric gases are recorded in ice cores with a log-normal-shaped age distribution probability density function, whose exact shape depends mainly on the accumulation rate on the drilling site. This age distribution effectively shifts the mid-transition points of rapid changes in CH4 measured in situ in ice by about 58% of the width of the age distribution with respect to the atmospheric signal. A minimum dating uncertainty, or artefact, in the CH4 synchronisation is therefore embedded in the concept itself, which was not accounted for in previous error estimates. This synchronisation artefact between Greenland and Antarctic ice cores is for GRIP and Byrd less than 40 years, well within the dating uncertainty of CH4, and therefore does not calls the overall concept of the bipolar seesaw into question. However, if the EPICA Dome C ice core is aligned via CH4 to NGRIP this synchronisation artefact is in the most recent unified ice core age scale (Lemieux-Dudon et al., 2010) for LGM climate conditions of the order of three centuries and might need consideration in future gas chronologies.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

Science.gov (United States)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less
Beam dynamics simulations using a parallel version of PARMILA

International Nuclear Information System (INIS)

Ryne, R.D.

1996-01-01

The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code
Beam dynamics simulations using a parallel version of PARMILA

International Nuclear Information System (INIS)

Ryne, Robert

1996-01-01

The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code. (author)
A Multiproxy Approach to Unraveling Climate and Human Demography in the Peruvian Altiplano from a 5000 year Lake Sediment Core

Science.gov (United States)

Vaught-Mijares, R. M.; Hillman, A. L.; Abbott, M. B.; Werne, J. P.; Arkush, E.

2017-12-01

Drought and flood events are thought to have shaped the ways in which Andean societies have adapted to life in the Titicaca Basin region, particularly with regard to land use practices and settlement patterns. This study examines a small lake in the region, Laguna Orurillo. Water isotopes suggest that the lake primarily loses water through evaporation, making it hydrologically sensitive. In 2015, a 3.4 m overlapping sediment record was collected and inspected for evidence of shallow water facies and erosional unconformities to reconstruct paleohydrology. Sediment core chronology was established using 7 AMS radiocarbon dates and 210Pb dating and indicates that the core spans 5000 years. Additional sediment core measurements include magnetic susceptibility, bulk density, organic/carbonate content, and XRD. Results show a pronounced change in sediment composition from brittle, angular salt deposits to massive calcareous silt and clay around 5000 years BP. Multiple transitions from clay to sand show potential lake level depressions at 1540, 2090, and 2230, yr BP that are supported by a drastic increase in carbonate composition from 2760-1600 yr BP. Additional shallow-water periods may be reflected in the presence of rip-up clasts from 4000 to 3000 yr BP. These early interpretations align well with existing hydrologic records from Lake Titicaca. In order to develop a more detailed climate and land use record, isotope analyses of authigenic carbonate minerals using δ13C and δ18O and leaf waxes using δD are being developed. Ultimately, this record will be linked with records from nearby Lagunas Arapa and Umayo. Additional proxies for human population such as fecal 5β-stanols and proximal anthropologic surveys will be synthesized to contribute to a regional understanding of Holocene climate variability and human demography in the Peruvian Altiplano.
High performance statistical computing with parallel R: applications to biology and climate modelling

International Nuclear Information System (INIS)

Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

2006-01-01

Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines
Implementing a low-latency parallel graphic equalizer with heterogeneous computing

NARCIS (Netherlands)

Norilo, Vesa; Verstraelen, Martinus Johannes Wilhelmina; Valimaki, Vesa; Svensson, Peter; Kristiansen, Ulf

2015-01-01

This paper describes the implementation of a recently introduced parallel graphic equalizer (PGE) in a heterogeneous way. The control and audio signal processing parts of the PGE are distributed to a PC and to a signal processor, of WaveCore architecture, respectively. This arrangement is
A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.

Science.gov (United States)

Mi, Tian; Rajasekaran, Sanguthevar

2013-07-01

Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε ) passes ( ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε )-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = M c for some constant c ). While, our proposed algorithm RSS runs in (2 + ε ) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of N ≫ M c .
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

Energy Technology Data Exchange (ETDEWEB)

Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A., E-mail: bharath@u.northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL (United States)

2013-02-15

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays

Directory of Open Access Journals (Sweden)

Rajeev Thakur

1996-01-01

Full Text Available A number of applications on parallel computers deal with very large data sets that cannot fit in main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/write smaller sections of the arrays from/to files. In this article, we describe a method for accessing sections of out-of-core arrays efficiently. Our method, the extended two-phase method, uses collective l/O: Processors cooperate to combine several l/O requests into fewer larger granularity requests, to reorder requests so that the file is accessed in proper sequence, and to eliminate simultaneous l/O requests for the same data. In addition, the l/O workload is divided among processors dynamically, depending on the access requests. We present performance results obtained from two real out-of-core parallel applications – matrix multiplication and a Laplace's equation solver – and several synthetic access patterns, all on the Intel Touchstone Delta. These results indicate that the extended two-phase method significantly outperformed a direct (noncollective method for accessing out-of-core array sections.
Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

Science.gov (United States)

Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

2017-10-01

Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
Electromagnetic Physics Models for Parallel Computing Architectures

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

2016-01-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)
SCORPIO - WWER core surveillance system

International Nuclear Information System (INIS)

Hornaes, Arne; Bodal, Terje; Sunde, Svein; Zalesky, K.; Lehman, M.; Pecka, M.; Svarny, J.; Krysl, V.; Juzova, Z.; Sedlak, A.; Semmler, M.

1998-01-01

The Institut for energiteknikk has developed the core surveillance system SCORPIO, which has two parallel modes of operation: the Core Follow Mode and the Predictive Mode. The main motivation behind the development of SCORPIO is to make a practical tool for reactor operators which can increase the quality and quantity of information presented on core status and dynamic behavior. This can first of all improve plant safety, as undesired core conditions are detected and prevented. Secondly, more flexible and efficient plant operation is made possible. The system has been implemented on western PWRs, but the basic concept is applicable to a wide range of reactors including WWERs. The main differences between WWERs and typical western PWRs with respect to core surveillance requirements are outlined. The development of a WWER version of SCORPIO has been done in co-operation with the Nuclear Research Institute Rez, and industry partners in the Czech Republic. The first system is installed at Dukovany NPP, where the Site Acceptance Test was completed 6. March 1998.(Authors)
SCORPIO - VVER core surveillance system

International Nuclear Information System (INIS)

Hornaes, A.; Bodal, T.; Sunde, S.

1998-01-01

The Institutt for energiteknikk has developed the core surveillance system SCORPIO, which has two parallel modes of operation: the Core Follow Mode and the Predictive Mode. The main motivation behind the development of SCORPIO is to make a practical tool for reactor operators, which can increase the quality and quantity of information presented on core status and dynamic behavior. This can first of all improve plant safety, as undesired core conditions are detected and prevented. Secondly, more flexible and efficient plant operation is made possible. The system has been implemented on western PWRs, but the basic concept is applicable to a wide range of reactors including VVERs. The main differences between VVERs and typical western PWRs with respect to core surveillance requirements are outlined. The development of a VVER version of SCORPIO has been done in co-operation with the Nuclear Research Institute Rez, and industry partners in the Czech Republic. The first system is installed at Dukovany NPP, where the Site Acceptance Test was completed 6. March 1998.(author)
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

Energy Technology Data Exchange (ETDEWEB)

Aktulga, Hasan Metin [Michigan State Univ., East Lansing, MI (United States); Coffman, Paul [Argonne National Lab. (ANL), Argonne, IL (United States); Shan, Tzu-Ray [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Knight, Chris [Argonne National Lab. (ANL), Argonne, IL (United States); Jiang, Wei [Argonne National Lab. (ANL), Argonne, IL (United States)

2015-12-01

Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups in the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.
Parallel efficient rate control methods for JPEG 2000

Science.gov (United States)

Martínez-del-Amor, Miguel Á.; Bruns, Volker; Sparenberg, Heiko

2017-09-01

Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.
CITYZEN climate impact studies

Energy Technology Data Exchange (ETDEWEB)

Schutz, Martin (ed.)

2011-07-01

We have estimated the impact of climate change on the chemical composition of the troposphere due to changes in climate from current climate (2000-2010) looking 40 years ahead (2040-2050). The climate projection has been made by the ECHAM5 model and was followed by chemistry-transport modelling using a global model, Oslo CTM2 (Isaksen et al., 2005; Srvde et al., 2008), and a regional model, EMEP. In this report we focus on carbon monoxide (CO) and surface ozone (O3) which are measures of primary and secondary air pollution. In parallel we have estimated the change in the same air pollutants resulting from changes in emissions over the same time period. (orig.)
Long-term Records of Pacific Salmon Abundance From Sediment Core Analysis: Relationships to Past Climatic Change, and Implications for the Future

Science.gov (United States)

Finney, B.

2002-12-01

The response of Pacific salmon to future climatic change is uncertain, but will have large impacts on the economy, culture and ecology of the North Pacific Rim. Relationships between sockeye salmon populations and climatic change can be determined by analyzing sediment cores from lakes where sockeye return to spawn. Sockeye salmon return to their natal lake system to spawn and subsequently die following 2 - 3 years of feeding in the North Pacific Ocean. Sockeye salmon abundance can be reconstructed from stable nitrogen isotope analysis of lake sediment cores as returning sockeye transport significant quantities of N, relatively enriched in N-15, from the ocean to freshwater systems. Temporal changes in the input of salmon-derived N, and hence salmon abundance, can be quantified through downcore analysis of N isotopes. Reconstructions of sockeye salmon abundance from lakes in several regions of Alaska show similar temporal patterns, with variability occurring on decadal to millennial timescales. Over the past 2000 years, shifts in sockeye salmon abundance far exceed the historical decadal-scale variability. A decline occurred from about 100 BC - 800 AD, but salmon were consistently more abundant 1200 - 1900 AD. Declines since 1900 AD coincide with the period of extensive commercial fishing. Correspondence between these records and paleoclimatic data suggest that changes in salmon abundance are related to large scale climatic changes over the North Pacific. For example, the increase in salmon abundance c.a. 1200 AD corresponds to a period of glacial advance in southern Alaska, and a shift to drier conditions in western North America. Although the regionally coherent patterns in reconstructed salmon abundance are consistent with the hypothesis that climate is an important driver, the relationships do not always follow patterns observed in the 20th century. A main feature of recorded climate variability in this region is the alternation between multi-decade periods of
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel

2013-10-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

2013-01-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
The boat hull model : adapting the roofline model to enable performance prediction for parallel computing

NARCIS (Netherlands)

Nugteren, C.; Corporaal, H.

2012-01-01

Multi-core and many-core were already major trends for the past six years, and are expected to continue for the next decades. With these trends of parallel computing, it becomes increasingly difficult to decide on which architecture to run a given application. In this work, we use an algorithm
Domain Specific Language for Geant4 Parallelization for Space-based Applications, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — A major limiting factor in HPC growth is the requirement to parallelize codes to leverage emerging architectures, especially as single core performance has plateaued...
High-resolution global climate modelling: the UPSCALE project, a large-simulation campaign

Directory of Open Access Journals (Sweden)

M. S. Mizielinski

2014-08-01

Full Text Available The UPSCALE (UK on PRACE: weather-resolving Simulations of Climate for globAL Environmental risk project constructed and ran an ensemble of HadGEM3 (Hadley Centre Global Environment Model 3 atmosphere-only global climate simulations over the period 1985–2011, at resolutions of N512 (25 km, N216 (60 km and N96 (130 km as used in current global weather forecasting, seasonal prediction and climate modelling respectively. Alongside these present climate simulations a parallel ensemble looking at extremes of future climate was run, using a time-slice methodology to consider conditions at the end of this century. These simulations were primarily performed using a 144 million core hour, single year grant of computing time from PRACE (the Partnership for Advanced Computing in Europe in 2012, with additional resources supplied by the Natural Environment Research Council (NERC and the Met Office. Almost 400 terabytes of simulation data were generated on the HERMIT supercomputer at the High Performance Computing Center Stuttgart (HLRS, and transferred to the JASMIN super-data cluster provided by the Science and Technology Facilities Council Centre for Data Archival (STFC CEDA for analysis and storage. In this paper we describe the implementation of the project, present the technical challenges in terms of optimisation, data output, transfer and storage that such a project involves and include details of the model configuration and the composition of the UPSCALE data set. This data set is available for scientific analysis to allow assessment of the value of model resolution in both present and potential future climate conditions.
Parallel Breadth-First Search on Distributed Memory Systems

Energy Technology Data Exchange (ETDEWEB)

Computational Research Division; Buluc, Aydin; Madduri, Kamesh

2011-04-15

Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
Method for orienting a borehole core

International Nuclear Information System (INIS)

Henry, W.

1980-01-01

A method is described for longitudinally orienting a borehold core with respect to the longitudinal axis of the drill string which drilled said borehold core in such a manner that the original longitudinal attitude of said borehold core within the earth may be determined. At least a portion of said borehold core is partialy demagnetized in steps to thereby at least partially remove in steps the artificial remanent magnetism imparted to said borehole core by said drill string. The artifical remanent magnetism is oriented substantially parallel to the longitudinal axis of said drill string. The direction and intensity of the total magnetism of said borehold core is measured at desired intervals during the partial demagnetizing procedure. An artificial remanent magnetism vector is established which extends from the final measurement of the direction and intensity of the total magnetism of said borehole core taken during said partial demagnetizing procedure towards the initial measurement of the direction and intensity of the total magnetism of said borehold core taken during said partial demagnetizing procedure. The borehold core is oriented in such a manner that said artificial remanent magnetism vector points at least substantially downwardly towards the bottom of said borehold core for a borehold in the northern hemisphere and points at least substantailly upwardly towards the top of said borehole core for a borehole in the southern hemisphere
Denali Ice Core MSA: A Record of North Pacific Primary Productivity

Science.gov (United States)

Polashenski, D.; Osterberg, E. C.; Winski, D.; Kreutz, K. J.; Wake, C. P.; Ferris, D. G.; Introne, D.; Campbell, S. W.

2017-12-01

The high nutrient, low chlorophyll region of the North Pacific is one of the most biologically productive marine ecosystems in the world and forms the basis of commercial, sport, and subsistence fisheries worth more than a billion dollars annually. Marine phytoplankton prove to be important both as the primary producers in these ecosystems and as a major source of biogenic sulfur emissions which have long been hypothesized to serve as a biological control on Earth's climate system. Despite their importance, the record of marine phytoplankton abundance and the flux of biogenic sulfur from these regions is not well constrained. In situ measurements of marine phytoplankton from oceanographic cruises over the past several decades are limited in both spatial and temporal resolution. Meanwhile, marine sediment records may provide insight on million year timescales, but lack decadal resolution due to slow sediment deposition rates and bioturbation. In this study, we aim to investigate changes in marine phytoplankton productivity of the northeastern subarctic Pacific Ocean (NSPO) over the twentieth century using the methanesulfonic acid (MSA) record from the Mt. Hunter ice cores drilled in Denali National Park, Alaska. These parallel, 208 meter long ice cores were drilled during the 2013 field season on the Mt. Hunter plateau (63° N, 151° W, 4,000 m above sea level). Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) modeling is used to identify likely source areas in the NSPO for MSA being transported to the core site. SeaWiFS satellite imagery allows for a direct comparison of chlorophyll a concentrations in these source areas with MSA concentrations in the core record through time. Our findings suggest that the Denali ice core MSA record reflects changes in the biological productivity of marine phytoplankton and shows a significant decline in MSA beginning in 1961. We investigate several hypotheses for potential mechanisms driving this MSA decline
PLAST: parallel local alignment search tool for database comparison

Directory of Open Access Journals (Sweden)

Lavenier Dominique

2009-10-01

Full Text Available Abstract Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set and the multithreading concept (multicore. Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.
The significance of volcanic ash in Greenland ice cores during the Common Era

Science.gov (United States)

Plunkett, G.; Pilcher, J. R.; McConnell, J. R.; Sigl, M.; Chellman, N.

2017-12-01

Volcanic forcing is now widely regarded as a leading natural factor in short-term climate variability. Polar ice cores provide an unrivalled and continuous record of past volcanism through their chemical and particulate content. With an almost annual precision for the Common Era, the ice core volcanic record can be combined with historical data to investigate the climate and social impacts of the eruptions. The sulfate signature in ice cores is critical for determining the possible climate effectiveness of an eruption, but the presence and characterization of volcanic ash (tephra) in the ice is requisite for establishing the source eruption so that location and eruptive style can be better factored in to climate models. Here, we review the Greenland tephra record for the Common Era, and present the results of targeted sampling for tephra of volcanic events that are of interest either because of their suspected climate and societal impacts or because of their potential as isochrons in paleoenvironmental (including ice core) archives. The majority of identifiable tephras derive from Northern Hemisphere mid- to high latitude eruptions, demonstrating the significance of northern extra-tropical volcanic regions as a source of sulfates in Greenland. A number of targets are represented by sparse or no tephra, or shards that cannot be firmly correlated with a source. We consider the challenges faced in isolating and characterizing tephra from low latitude eruptions, and the implications for accurately modelling climate response to large, tropical events. Finally, we compare the ice core tephra record with terrestrial tephrostratigraphies in the circum-North Atlantic area to evaluate the potential for intercontinental tephra linkages and the refinement of volcanic histories.
Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

International Nuclear Information System (INIS)

Choi, Sung Hoon; Joo, Han Gyu

2009-01-01

Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated
Summary of multi-core hardware and programming model investigations

Energy Technology Data Exchange (ETDEWEB)

Kelly, Suzanne Marie; Pedretti, Kevin Thomas Tauke; Levenhagen, Michael J.

2008-05-01

This report summarizes our investigations into multi-core processors and programming models for parallel scientific applications. The motivation for this study was to better understand the landscape of multi-core hardware, future trends, and the implications on system software for capability supercomputers. The results of this study are being used as input into the design of a new open-source light-weight kernel operating system being targeted at future capability supercomputers made up of multi-core processors. A goal of this effort is to create an agile system that is able to adapt to and efficiently support whatever multi-core hardware and programming models gain acceptance by the community.
Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

Science.gov (United States)

Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

2011-01-01

Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.
Parallel computing for homogeneous diffusion and transport equations in neutronics; Calcul parallele pour les equations de diffusion et de transport homogenes en neutronique

Energy Technology Data Exchange (ETDEWEB)

Pinchedez, K

1999-06-01

Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)
Ice Sheets & Ice Cores

DEFF Research Database (Denmark)

Mikkelsen, Troels Bøgeholm

Since the discovery of the Ice Ages it has been evident that Earth’s climate is liable to undergo dramatic changes. The previous climatic period known as the Last Glacial saw large oscillations in the extent of ice sheets covering the Northern hemisphere. Understanding these oscillations known....... The first part concerns time series analysis of ice core data obtained from the Greenland Ice Sheet. We analyze parts of the time series where DO-events occur using the so-called transfer operator and compare the results with time series from a simple model capable of switching by either undergoing...
Patterns of volcanism, weathering, and climate history from high-resolution geochemistry of the BINGO core, Mono Lake, California, USA

Science.gov (United States)

Zimmerman, S. R.; Starratt, S.; Hemming, S. R.

2012-12-01

Mono Lake, California is a closed-basin lake on the east side of the Sierra Nevada, and inflow from snowmelt dominates the modern hydrology. Changes in wetness during the last glacial period (>12,000 years ago) and over the last 2,000 years have been extensively described, but are poorly known for the intervening period. We have recovered a 6.25 m-long core from ~3 m of water in the western embayment of Mono Lake, which is shown by initial radiocarbon dates to cover at least the last 10,000 years. The sediments of the core are variable, ranging from black to gray silts near the base, laminated olive-green silt through the center, to layers of peach-colored carbonate nodules interbedded with gray and olive silts and pea-green organic ooze. Volcanic tephras from Bodie and Adobe Hills to the north, east, and south. The rhyolitic tephras of the Mono-Inyo Craters are much lower in TiO2 than the bedrock (10,000 calibrated years before present (cal yr BP) higher in the core, and significant disruption of the fine layers, this interval likely indicates a relatively deep lake persisting into the early Holocene, after the initial dramatic regression from late Pleistocene levels. The finely laminated olive-green silt of the period ~10,700 to ~7500 cal yr BP is very homogenous chemically, probably indicating a stable, stratified lake and a relatively wet climate. This section merits mm-scale scanning and petrographic examination in the future. The upper boundary of the laminated section shows rising Ca/K and decreasing Ti and Si/K, marking the appearance of authigenic carbonate layers. After ~7500 cal yr BP, the sediment in BINGO becomes highly variable, with increased occurrence of tephra layers and carbonate, indicating a lower and more variable lake level. A short interval of olive-green, laminated fine sand/silt just above a radiocarbon date of 3870 ± 360 cal yr BP may record the Dechambeau Ranch highstand of Stine (1990; PPP v. 78 pp 333-381), and is marked by a distinct
Paleoceanographic records in the sedimentary cores from the middle Okinawa Trough

Institute of Scientific and Technical Information of China (English)

无

2003-01-01

Two gravity piston cores (Cores 155 and 180) involved in this study were collected from the middle Okinawa Trough. Stratigraphy of the two cores was divided and classified based on the features of planktonic foraminifera oxygen isotope changes together with depositional sequence, millennium-scale climatic event comparison, carbonate cycles and AMS14C dating. Some paleoclimatic information contained in sediments of these cores was extracted to discuss the paleoclimatic change rules and the short-time scale events presented in interglacial period. Analysis on the variation of oxygen isotope values in stage two shows that the middle part of the Okinawa Trough may have been affected by fresh water from the Yellow River and the Yangtze River during the Last Glacial Maximum (LGM). The oxygen isotope value oscillating ranges of the cores have verified that the marginal sea has an amplifying effect on climate changes. The Δ13c of benthic foraminifera Uvigerina was lighter in the glacial period than that in the interglacial period, which indicates that the Paleo-Kuroshio's main stream moved eastward and its influence area decreased. According to the temperature difference during the "YD" period existing in Core 180 and other data, we can reach the conclusion that the climatic changes in the middle Okinawa Trough area were controlled by global climatic changes, but some regional factors had also considerable influence on the climate changes. Some results in this paper support Fairbanks's point that the "YD" event was a brief stagnation of sea level rising during the global warming up procession. Moreover, the falling of sea level in the glacial period weakened the exchange between the bottom water of the Okinawa Trough and the deep water of the northwestern Pacific Ocean and resulted in low oxygen state of bottom water in this area. These procedures are the reasons for carbonate cycle in the Okinawa Trough area being consistent with the "Atlantic type " carbonate cycle.
ClimateSpark: An in-memory distributed computing framework for big climate data analytics

Science.gov (United States)

Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

2018-06-01

The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.
Parallel Transport with Sheath and Collisional Effects in Global Electrostatic Turbulent Transport in FRCs

Science.gov (United States)

Bao, Jian; Lau, Calvin; Kuley, Animesh; Lin, Zhihong; Fulton, Daniel; Tajima, Toshiki; Tri Alpha Energy, Inc. Team

2017-10-01

Collisional and turbulent transport in a field reversed configuration (FRC) is studied in global particle simulation by using GTC (gyrokinetic toroidal code). The global FRC geometry is incorporated in GTC by using a field-aligned mesh in cylindrical coordinates, which enables global simulation coupling core and scrape-off layer (SOL) across the separatrix. Furthermore, fully kinetic ions are implemented in GTC to treat magnetic-null point in FRC core. Both global simulation coupling core and SOL regions and independent SOL region simulation have been carried out to study turbulence. In this work, the ``logical sheath boundary condition'' is implemented to study parallel transport in the SOL. This method helps to relax time and spatial steps without resolving electron plasma frequency and Debye length, which enables turbulent transports simulation with sheath effects. We will study collisional and turbulent SOL parallel transport with mirror geometry and sheath boundary condition in C2-W divertor.
Rapid evolution of phenology during range expansion with recent climate change.

Science.gov (United States)

Lustenhouwer, Nicky; Wilschut, Rutger A; Williams, Jennifer L; van der Putten, Wim H; Levine, Jonathan M

2018-02-01

Although climate warming is expected to make habitat beyond species' current cold range edge suitable for future colonization, this new habitat may present an array of biotic or abiotic conditions not experienced within the current range. Species' ability to shift their range with climate change may therefore depend on how populations evolve in response to such novel environmental conditions. However, due to the recent nature of thus far observed range expansions, the role of rapid adaptation during climate change migration is only beginning to be understood. Here, we evaluated evolution during the recent native range expansion of the annual plant Dittrichia graveolens, which is spreading northward in Europe from the Mediterranean region. We examined genetically based differentiation between core and edge populations in their phenology, a trait that is likely under selection with shorter growing seasons and greater seasonality at northern latitudes. In parallel common garden experiments at range edges in Switzerland and the Netherlands, we grew plants from Dutch, Swiss, and central and southern French populations. Population genetic analysis following RAD-sequencing of these populations supported the hypothesized central France origins of the Swiss and Dutch range edge populations. We found that in both common gardens, northern plants flowered up to 4 weeks earlier than southern plants. This differentiation in phenology extended from the core of the range to the Netherlands, a region only reached from central France over approximately the last 50 years. Fitness decreased as plants flowered later, supporting the hypothesized benefits of earlier flowering at the range edge. Our results suggest that native range expanding populations can rapidly adapt to novel environmental conditions in the expanded range, potentially promoting their ability to spread. © 2017 John Wiley & Sons Ltd.
EEA core set of indicators. Guide

Energy Technology Data Exchange (ETDEWEB)

NONE

2005-07-01

This guide provides information on the quality of the 37 indicators in the EEA core set. Its primary role is to support improved implementation of the core set in the EEA, European topic centres and the European environment information and observation network (Eionet). In parallel, it is aimed at helping users outside the EEA/Eionet system make best use of the indicators in their own work. It is hoped that the guide will promote cooperation on improving indicator methodologies and data quality as part of the wider process to streamline and improve environmental reporting in the European Union and beyond. (au)

基于OpenMP的电磁场FDTD多核并行程序设计%Design of electromagnetic field FDTD multi-core parallel program based on OpenMP

Institute of Scientific and Technical Information of China (English)

吕忠亭; 张玉强; 崔巍

2013-01-01

探讨了基于OpenMP的电磁场FDTD多核并行程序设计的方法，以期实现该方法在更复杂的算法中应用具有更理想的性能提升。针对一个一维电磁场FDTD算法问题，对其计算方法与过程做了简单描述。在Fortran语言环境中，采用OpenMP+细粒度并行的方式实现了并行化，即只对循环部分进行并行计算，并将该并行方法在一个三维瞬态场电偶极子辐射FDTD程序中进行了验证。该并行算法取得了较其他并行FDTD算法更快的加速比和更高的效率。结果表明基于OpenMP的电磁场FDTD并行算法具有非常好的加速比和效率。%The method of the electromagnetic field FDTD multi-core parallel programm design based on OpenMP is dis-cussed,in order to implement ideal performance improvement of this method in the application of more sophisticated algorithms. Aiming at a problem existing in one-dimensional electromagnetic FDTD algorithm , its calculation method and process are described briefly. In Fortran language environment,the parallelism is achieved with OpenMP technology and fine-grained parallel way,that is,the parallel computation is performed only for the cycle part. The parallel method was verified in a three-dimensional transient electromagnetic field FDTD program for dipole radiation. The parallel algorithm has achieved faster speedup and higher efficiency than other parallel FDTD algoritms. The results indicate that the electromagnetic field FDTD parallel algorithm based on OpenMP has a good speedup and efficiency.
Core cooling system for reactor

International Nuclear Information System (INIS)

Kondo, Ryoichi; Amada, Tatsuo.

1976-01-01

Purpose: To improve the function of residual heat dissipation from the reactor core in case of emergency by providing a secondary cooling system flow channel, through which fluid having been subjected to heat exchange with the fluid flowing in a primary cooling system flow channel flows, with a core residual heat removal system in parallel with a main cooling system provided with a steam generator. Constitution: Heat generated in the core during normal reactor operation is transferred from a primary cooling system flow channel to a secondary cooling system flow channel through a main heat exchanger and then transferred through a steam generator to a water-steam system flow channel. In the event if removal of heat from the core by the main cooling system becomes impossible due to such cause as breakage of the duct line of the primary cooling system flow channel or a trouble in a primary cooling system pump, a flow control valve is opened, and steam generator inlet and outlet valves are closed, thus increasing the flow rate in the core residual heat removal system. Thereafter, a blower is started to cause dissipation of the core residual heat from the flow channel of a system for heat dissipation to atmosphere. (Seki, T.)
Extensive lake sediment coring survey on Sub-Antarctic Indian Ocean Kerguelen Archipelago (French Austral and Antarctic Lands)

Science.gov (United States)

Arnaud, Fabien; Fanget, Bernard; Malet, Emmanuel; Poulenard, Jérôme; Støren, Eivind; Leloup, Anouk; Bakke, Jostein; Sabatier, Pierre

2016-04-01

Recent paleo-studies revealed climatic southern high latitude climate evolution patterns that are crucial to understand the global climate evolution(1,2). Among others the strength and north-south shifts of westerlies wind appeared to be a key parameter(3). However, virtually no lands are located south of the 45th South parallel between Southern Georgia (60°W) and New Zealand (170°E) precluding the establishment of paleoclimate records of past westerlies dynamics. Located around 50°S and 70°E, lost in the middle of the sub-Antarctic Indian Ocean, Kerguelen archipelago is a major, geomorphologically complex, land-mass that is covered by hundreds lakes of various sizes. It hence offers a unique opportunity to reconstruct past climate and environment dynamics in a region where virtually nothing is known about it, except the remarkable recent reconstructions based on a Lateglacial peatbog sequence(4). During the 2014-2015 austral summer, a French-Norwegian team led the very first extensive lake sediment coring survey on Kerguelen Archipelago under the umbrella of the PALAS program supported by the French Polar Institute (IPEV). Two main areas were investigated: i) the southwest of the mainland, so-called Golfe du Morbihan, where glaciers are currently absent and ii) the northernmost Kerguelen mainland peninsula so-called Loranchet, where cirque glaciers are still present. This double-target strategy aims at reconstructing various independent indirect records of precipitation (glacier advance, flood dynamics) and wind speed (marine spray chemical species, wind-borne terrigenous input) to tackle the Holocene climate variability. Despite particularly harsh climate conditions and difficult logistics matters, we were able to core 6 lake sediment sites: 5 in Golfe du Morbihan and one in Loranchet peninsula. Among them two sequences taken in the 4km-long Lake Armor using a UWITEC re-entry piston coring system by 20 and 100m water-depth (6 and 7m-long, respectively). One
Discontinuous Galerkin Dynamical Core in HOMME

Energy Technology Data Exchange (ETDEWEB)

Nair, R. D. [Univ. of Colorado, Boulder, CO (United States); Tufo, Henry [Univ. of Colorado, Boulder, CO (United States)

2012-08-14

Atmospheric numerical modeling has been going through radical changes over the past decade. One major reason for this trend is due to the recent paradigm change in scientific computing , triggered by the arrival of petascale computing resources with core counts in the tens of thousands to hundreds of thousands range. Modern atmospheric modelers must adapt grid systems and numerical algorithms to facilitate an unprecedented levels of scalability on these modern highly parallel computer architectures. The numerical algorithms which can address these challenges should have the local properties such as high on-processor floating-point operation count to bytes moved and minimum parallel communication overhead.
Optimization of the coherence function estimation for multi-core central processing unit

Science.gov (United States)

Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

2017-02-01

The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
[Constructing climate. From classical climatology to modern climate research].

Science.gov (United States)

Heymann, Matthias

2009-01-01

Both climate researchers and historians of climate science have conceived climate as a stable and well defined category. This article argues that such a conception is flawed. In the course of the 19th and 20th century the very concept of climate changed considerably. Scientists came up with different definitions and concepts of climate, which implied different understandings, interests, and research approaches. Understanding climate shifted from a timeless, spatial concept at the end of the 19th century to a spaceless, temporal concept at the end of the 20th. Climatologists in the 19th and early 20th centuries considered climate as a set of atmospheric characteristics associated with specific places or regions. In this context, while the weather was subject to change, climate remained largely stable. Of particular interest was the impact of climate on human beings and the environment. In modern climate research at the close of the 20th century, the concept of climate lost its temporal stability. Instead, climate change has become a core feature of the understanding of climate and a focus of research interests. Climate has also lost its immediate association with specific geographical places and become global. The interest is now focused on the impact of human beings on climate. The paper attempts to investigate these conceptual shifts and their origins and impacts in order to provide a more comprehensive perspective on the history of climate research.
Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

KAUST Repository

Frisch, Jérôme

2015-05-22

The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC) simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.
Domain decomposition method using a hybrid parallelism and a low-order acceleration for solving the Sn transport equation on unstructured geometry

International Nuclear Information System (INIS)

Odry, Nans

2016-01-01

Deterministic calculation schemes are devised to numerically solve the neutron transport equation in nuclear reactors. Dealing with core-sized problems is very challenging for computers, so much that the dedicated core calculations have no choice but to allow simplifying assumptions (assembly- then core scale steps..). The PhD work aims at overcoming some of these approximations: thanks to important changes in computer architecture and capacities (HPC), nowadays one can solve 3D core-sized problems, using both high mesh refinement and the transport operator. It is an essential step forward in order to perform, in the future, reference calculations using deterministic schemes. This work focuses on a spatial domain decomposition method (DDM). Using massive parallelism, DDM allows much more ambitious computations in terms of both memory requirements and calculation time. Developments were performed inside the Sn core solver Minaret, from the new CEA neutronics platform APOLLO3. Only fast reactors (hexagonal periodicity) are considered, even if all kinds of geometries can be dealt with, using Minaret. The work has been divided in four steps: 1) The spatial domain decomposition with no overlap is inserted into the standard algorithmic structure of Minaret. The fundamental idea involves splitting a core-sized problem into smaller, independent, spatial sub-problems. angular flux is exchanged between adjacent sub-domains. In doing so, all combined sub-problems converge to the global solution at the outcome of an iterative process. Various strategies were explored regarding both data management and algorithm design. Results (k eff and flux) are systematically compared to the reference in a numerical verification step. 2) Introducing more parallelism is an unprecedented opportunity to heighten performances of deterministic schemes. Domain decomposition is particularly suited to this. A two-layer hybrid parallelism strategy, suited to HPC, is chosen. It benefits from the
High performance computing of density matrix renormalization group method for 2-dimensional model. Parallelization strategy toward peta computing

International Nuclear Information System (INIS)

Yamada, Susumu; Igarashi, Ryo; Machida, Masahiko; Imamura, Toshiyuki; Okumura, Masahiko; Onishi, Hiroaki

2010-01-01

We parallelize the density matrix renormalization group (DMRG) method, which is a ground-state solver for one-dimensional quantum lattice systems. The parallelization allows us to extend the applicable range of the DMRG to n-leg ladders i.e., quasi two-dimension cases. Such an extension is regarded to bring about several breakthroughs in e.g., quantum-physics, chemistry, and nano-engineering. However, the straightforward parallelization requires all-to-all communications between all processes which are unsuitable for multi-core systems, which is a mainstream of current parallel computers. Therefore, we optimize the all-to-all communications by the following two steps. The first one is the elimination of the communications between all processes by only rearranging data distribution with the communication data amount kept. The second one is the avoidance of the communication conflict by rescheduling the calculation and the communication. We evaluate the performance of the DMRG method on multi-core supercomputers and confirm that our two-steps tuning is quite effective. (author)
Neural networks within multi-core optic fibers.

Science.gov (United States)

Cohen, Eyal; Malka, Dror; Shemer, Amir; Shahmoon, Asaf; Zalevsky, Zeev; London, Michael

2016-07-07

Hardware implementation of artificial neural networks facilitates real-time parallel processing of massive data sets. Optical neural networks offer low-volume 3D connectivity together with large bandwidth and minimal heat production in contrast to electronic implementation. Here, we present a conceptual design for in-fiber optical neural networks. Neurons and synapses are realized as individual silica cores in a multi-core fiber. Optical signals are transferred transversely between cores by means of optical coupling. Pump driven amplification in erbium-doped cores mimics synaptic interactions. We simulated three-layered feed-forward neural networks and explored their capabilities. Simulations suggest that networks can differentiate between given inputs depending on specific configurations of amplification; this implies classification and learning capabilities. Finally, we tested experimentally our basic neuronal elements using fibers, couplers, and amplifiers, and demonstrated that this configuration implements a neuron-like function. Therefore, devices similar to our proposed multi-core fiber could potentially serve as building blocks for future large-scale small-volume optical artificial neural networks.
Core clamping device for a nuclear reactor

International Nuclear Information System (INIS)

Guenther, R.W.

1974-01-01

The core clamping device for a fast neutron reactor includes clamps to support the fuel zone against the pressure vessel. The clamps are arranged around the circumference of the core. They consist of torsion bars arranged parallel at some distance around the core with lever arms attached to the ends whose force is directed in the opposite direction, pressing against the wall of the pressure vessel. The lever arms and pressure plates also actuated by the ends of the torsion bars transfer the stress, the pressure plates acting upon the fuel elements or fuel assemblies. Coupling between the ends of the torsion bars and the pressure plates is achieved by end carrier plates directly attached to the torsion bars and radially movable. This clamping device follows the thermal expansions of the core, allows specific elements to be disengaged in sections and saves space between the core and the neutron reflectors. (DG) [de
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

Science.gov (United States)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
Development of parallel 3D discrete ordinates transport program on JASMIN framework

International Nuclear Information System (INIS)

Cheng, T.; Wei, J.; Shen, H.; Zhong, B.; Deng, L.

2015-01-01

A parallel 3D discrete ordinates radiation transport code JSNT-S is developed, aiming at simulating real-world radiation shielding and reactor physics applications in a reasonable time. Through the patch-based domain partition algorithm, the memory requirement is shared among processors and a space-angle parallel sweeping algorithm is developed based on data-driven algorithm. Acceleration methods such as partial current rebalance are implemented. The correctness is proved through the VENUS-3 and other benchmark models. In the radiation shielding calculation of the Qinshan-II reactor pressure vessel model with 24.3 billion DoF, only 88 seconds is required and the overall parallel efficiency of 44% is achieved on 1536 CPU cores. (author)
Grain-size data from four cores from Walker Lake, Nevada

International Nuclear Information System (INIS)

Yount, J.C.; Quimby, M.F.

1990-01-01

A number of cores, taken from within and near Walker Lake, Nevada are being studied by various investigators in order to evaluate the late-Pleistocene paleoclimate of the west-central Great Basin. In particular, the cores provide records that can be interpreted in terms of past climate and compared to proposed numerical models of the region's climate. All of these studies are being carried out as part of an evaluation of the regional paleoclimatic setting of a proposed high-level nuclear waste storage facility at Yucca Mountain, Nevada. Changes in past climate often manifest themselves in changes in sedimentary processes or in changes in the volume of sediment transported by those processes. One fundamental sediment property that can be related to depositional processes is grain size. Grain size effects other physical properties of sediment such as porosity and permeability which, in turn, affect the movement and chemistry of fluids. The purposes of this report are: (1) to document procedures of sample preparation and analysis, and (2) to summarize grain-size statistics for 659 samples from Walker Lake cores 84-4, 84-5, 84-8 and 85-2. Plots of mean particle diameter, percent sand, and the ratio of silt to clay are illustrated for various depth intervals within each core. Summary plots of mean grain size, sorting, and skewness parameters allow comparison of textural data between each core. 15 refs., 8 figs., 3 tabs
Climate Changes Documented in Ice Core Records from Third Pole Glaciers, with Emphasis on the Guliya Ice Cap in the Western Kunlun Mountains over the Last 100 Years

Science.gov (United States)

Thompson, L. G.; Yao, T.; Beaudon, E.; Mosley-Thompson, E.; Davis, M. E.; Kenny, D. V.; Lin, P. N.

2016-12-01

The Third Pole (TP) is a rapidly warming region containing 100,000 km2 of ice cover that collectively holds one of Earth's largest stores of freshwater that feeds Asia's largest rivers and helps sustain 1.5 billion people. Information on the accelerating warming in the region, its impact on the glaciers and subsequently on future water resources is urgently needed to guide mitigation and adaptation policies. Ice core histories collected over the last three decades across the TP demonstrate its climatic complexity and diversity. Here we present preliminary results from the flagship project of the Third Pole Environment Program, the 2015 Sino-American cooperative ice core drilling of the Guliya ice cap in the Kunlun Mountains in the western TP near the northern limit of the region influenced by the southwest monsoon. Three ice cores, each 51 meters in length, were recovered from the summit ( 6700 masl) while two deeper cores, one to bedrock ( 310 meters), were recovered from the plateau ( 6200 masl). Across the ice cap the net balance (accumulation) has increased annually by 2.3 cm of water equivalent from 1963-1992 to 1992-2015, and average oxygen isotopic ratios (δ18O) have enriched by 2‰. This contrasts with the recent ablation on the Naimona'nyi glacier located 540 km south of Guliya in the western Himalaya. Borehole temperatures in 2015 on the Guliya plateau have warmed substantially in the upper 30 meters of the ice compared to temperatures in 1992, when the first deep-drilling of the Guliya plateau was conducted. Compared with glaciers in the northern and western TP, the Himalayan ice fields are more sensitive to both fluctuations in the South Asian Monsoon and rising temperatures in the region. We examine the climatic changes of the last century preserved in ice core records from sites throughout the TP and compare them with those reconstructed for earlier warm epochs, such as the Medieval Climate Anomaly ( 950-1250 AD), the early Holocene "Hypsithermal
A New Approach of Parallelism and Load Balance for the Apriori Algorithm

Directory of Open Access Journals (Sweden)

BOLINA, A. C.

2013-06-01

Full Text Available The main goal of data mining is to discover relevant information on digital content. The Apriori algorithm is widely used to this objective, but its sequential version has a low performance when execu- ted over large volumes of data. Among the solutions for this problem is the parallel implementation of the algorithm, and among the parallel implementations presented in the literature that based on Apriori, it highlights the DPA (Distributed Parallel Apriori [10]. This paper presents the DMTA (Distributed Multithread Apriori algorithm, which is based on DPA and exploits the parallelism level of threads in order to increase the performance. Besides, DMTA can be executed over heterogeneous hardware platform, using different number of cores. The results showed that DMTA outperforms DPA, presents load balance among processes and threads, and it is effective in current multicore architectures.
Parallel computing for homogeneous diffusion and transport equations in neutronics

International Nuclear Information System (INIS)

Pinchedez, K.

1999-06-01

Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)
Emergency core cooling device

International Nuclear Information System (INIS)

Suzaki, Kiyoshi; Inoue, Akihiro.

1979-01-01

Purpose: To improve core cooling effect by making the operation region for a plurality of water injection pumps more broader. Constitution: An emergency reactor core cooling device actuated upon failure of recycling pipe ways is adapted to be fed with cooling water through a thermal sleeve by way of a plurality of water injection pump from pool water in a condensate storage tank and a pressure suppression chamber as water feed source. Exhaust pipes and suction pipes of each of the pumps are connected by way of switching valves and the valves are switched so that the pumps are set to a series operation if the pressure in the pressure vessel is high and the pumps are set to a parallel operation if the pressure in the pressure vessel is low. (Furukawa, Y.)
Parallel PDE-Based Simulations Using the Common Component Architecture

International Nuclear Information System (INIS)

McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

2006-01-01

The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of component based software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and general purpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications
Full core reactor analysis: Running Denovo on Jaguar

Energy Technology Data Exchange (ETDEWEB)

Jarrell, J. J.; Godfrey, A. T.; Evans, T. M.; Davidson, G. G. [Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, TN 37831 (United States)

2012-07-01

Fully-consistent, full-core, 3D, deterministic neutron transport simulations using the orthogonal mesh code Denovo were run on the massively parallel computing architecture Jaguar XT5. Using energy and spatial parallelization schemes, Denovo was able to efficiently scale to more than 160 k processors. Cell-homogenized cross sections were used with step-characteristics, linear-discontinuous finite element, and trilinear-discontinuous finite element spatial methods. It was determined that using the finite element methods gave considerably more accurate eigenvalue solutions for large-aspect ratio meshes than using step-characteristics. (authors)

Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

International Nuclear Information System (INIS)

Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

2015-01-01

Highlights: • COBRA-TF was adopted by the Consortium for Advanced Simulation of LWRs. • We have improved code performance to support running large-scale LWR simulations. • Code optimization has led to reductions in execution time and memory usage. • An MPI parallelization has reduced full-core simulation time from days to minutes. - Abstract: This paper describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis. A set of serial code optimizations—including fixing computational inefficiencies, optimizing the numerical approach, and making smarter data storage choices—are first described and shown to reduce both execution time and memory usage by about a factor of ten. Next, a “single program multiple data” parallelization strategy targeting distributed memory “multiple instruction multiple data” platforms utilizing domain decomposition is presented. In this approach, data communication between processors is accomplished by inserting standard Message-Passing Interface (MPI) calls at strategic points in the code. The domain decomposition approach implemented assigns one MPI process to each fuel assembly, with each domain being represented by its own CTF input file. The creation of CTF input files, both for serial and parallel runs, is also fully automated through use of a pressurized water reactor (PWR) pre-processor utility that uses a greatly simplified set of user input compared with the traditional CTF input. To run CTF in
Fast parallel tracking algorithm for the muon detector of the CBM experiment at FAIR

International Nuclear Information System (INIS)

Lebedev, A.; Hoehne, C.; Kisel', I.; Ososkov, G.

2010-01-01

Particle trajectory recognition is an important and challenging task in the Compressed Baryonic Matter (CBM) experiment at the future FAIR accelerator at Darmstadt. The tracking algorithms have to process terabytes of input data produced in particle collisions. Therefore, the speed of the tracking software is extremely important for data analysis. In this contribution, a fast parallel track reconstruction algorithm, which uses available features of modern processors is presented. These features comprise a SIMD instruction set (SSE) and multithreading. The first allows one to pack several data items into one register and to operate on all of them in parallel thus achieving more operations per cycle. The second feature enables the routines to exploit all available CPU cores and hardware threads. This parallel version of the tracking algorithm has been compared to the initial serial scalar version which uses a similar approach for tracking. A speed-upfactor of 487 was achieved (from 730 to 1.5 ms/event) for a computer with 2 x Intel Core 17 processors at 2.66 GHz
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

Science.gov (United States)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We
Late Quaternary climatic changes in the Ross Sea area, Antarctica

International Nuclear Information System (INIS)

Brambati, A.; Melis, R.; Quaia, T.; Salvi, G.

2002-01-01

Ten cores from the Ross Sea continental margin were investigated to detect Late Quaternary climatic changes. Two main climatic cycles over the last 300,000 yr (isotope stages 1-8) were recognised in cores from the continental slope, whereas minor fluctuations over the last 30,000 yr were found in cores from the continental shelf. The occurrence of calcareous taxa within the Last Glacial interval and their subsequent disappearance reveal a general raising of the CCD during the last climatic cycle. In addition, periodical trends of c. 400, c. 700, and c. 1400 yr determined on calcareous foraminifers from sediments of the Joides Basin, indicate fluctuations of the Ross Ice Shelf between 15 and 30 ka BP. (author). 24 refs., 5 figs
Paleoclimates: Understanding climate change past and present

Science.gov (United States)

Cronin, Thomas M.

2010-01-01

The field of paleoclimatology relies on physical, chemical, and biological proxies of past climate changes that have been preserved in natural archives such as glacial ice, tree rings, sediments, corals, and speleothems. Paleoclimate archives obtained through field investigations, ocean sediment coring expeditions, ice sheet coring programs, and other projects allow scientists to reconstruct climate change over much of earth's history. When combined with computer model simulations, paleoclimatic reconstructions are used to test hypotheses about the causes of climatic change, such as greenhouse gases, solar variability, earth's orbital variations, and hydrological, oceanic, and tectonic processes. This book is a comprehensive, state-of-the art synthesis of paleoclimate research covering all geological timescales, emphasizing topics that shed light on modern trends in the earth's climate. Thomas M. Cronin discusses recent discoveries about past periods of global warmth, changes in atmospheric greenhouse gas concentrations, abrupt climate and sea-level change, natural temperature variability, and other topics directly relevant to controversies over the causes and impacts of climate change. This text is geared toward advanced undergraduate and graduate students and researchers in geology, geography, biology, glaciology, oceanography, atmospheric sciences, and climate modeling, fields that contribute to paleoclimatology. This volume can also serve as a reference for those requiring a general background on natural climate variability.
The influence of collisional and anomalous radial diffusion on parallel ion transport in edge plasmas

International Nuclear Information System (INIS)

Helander, P.; Hazeltine, R.D.; Catto, P.J.

1996-01-01

The orderings in the kinetic equations commonly used to study the plasma core of a tokamak do not allow a balance between parallel ion streaming and radial diffusion, and are, therefore, inappropriate in the plasma edge. Different orderings are required in the edge region where radial transport across the steep gradients associated with the scrape-off layer is large enough to balance the rapid parallel flow caused by conditions close to collecting surfaces (such as the Bohm sheath condition). In the present work, we derive and solve novel kinetic equations, allowing for such a balance, and construct distinctive transport laws for impure, collisional, edge plasmas in which the perpendicular transport is (i) due to Coulomb collisions of ions with heavy impurities, or (ii) governed by anomalous diffusion driven by electrostatic turbulence. In both the collisional and anomalous radial transport cases, we find that one single diffusion coefficient determines the radial transport of particles, momentum and heat. The parallel transport laws and parallel thermal force in the scrape-off layer assume an unconventional form, in which the relative ion-impurity flow is driven by a combination of the conventional parallel gradients, and new (i) collisional or (ii) anomalous terms involving products of radial derivatives of the temperature and density with the radial shear of the parallel velocity. Thus, in the presence of anomalous radial diffusion, the parallel ion transport cannot be entirely classical, as usually assumed in numerical edge computations. The underlying physical reason is the appearance of a novel type of parallel thermal force resulting from the combined action of anomalous diffusion and radial temperature and velocity gradients. In highly sheared flows the new terms can modify impurity penetration into the core plasma
Comparison Of Hybrid Sorting Algorithms Implemented On Different Parallel Hardware Platforms

Directory of Open Access Journals (Sweden)

Dominik Zurek

2013-01-01

Full Text Available Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility to speed up many algorithms. In this paper we describe results of implementation of a few different sorting algorithms on GPU cards and multicore processors. Then hybrid algorithm will be presented which consists of parts executed on both platforms, standard CPU and GPU.
Hybrid parallel computing architecture for multiview phase shifting

Science.gov (United States)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Core microbiomes for sustainable agroecosystems.

Science.gov (United States)

Toju, Hirokazu; Peay, Kabir G; Yamamichi, Masato; Narisawa, Kazuhiko; Hiruma, Kei; Naito, Ken; Fukuda, Shinji; Ushio, Masayuki; Nakaoka, Shinji; Onoda, Yusuke; Yoshida, Kentaro; Schlaeppi, Klaus; Bai, Yang; Sugiura, Ryo; Ichihashi, Yasunori; Minamisawa, Kiwamu; Kiers, E Toby

2018-05-01

In an era of ecosystem degradation and climate change, maximizing microbial functions in agroecosystems has become a prerequisite for the future of global agriculture. However, managing species-rich communities of plant-associated microbiomes remains a major challenge. Here, we propose interdisciplinary research strategies to optimize microbiome functions in agroecosystems. Informatics now allows us to identify members and characteristics of 'core microbiomes', which may be deployed to organize otherwise uncontrollable dynamics of resident microbiomes. Integration of microfluidics, robotics and machine learning provides novel ways to capitalize on core microbiomes for increasing resource-efficiency and stress-resistance of agroecosystems.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

Science.gov (United States)

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Climate scenarios for California

Science.gov (United States)

Cayan, Daniel R.; Maurer, Ed; Dettinger, Mike; Tyree, Mary; Hayhoe, Katharine; Bonfils, Celine; Duffy, Phil; Santer, Ben

2006-01-01

Possible future climate changes in California are investigated from a varied set of climate change model simulations. These simulations, conducted by three state-of-the-art global climate models, provide trajectories from three greenhouse gas (GHG) emission scenarios. These scenarios and the resulting climate simulations are not “predictions,” but rather are a limited sample from among the many plausible pathways that may affect California’s climate. Future GHG concentrations are uncertain because they depend on future social, political, and technological pathways, and thus the IPCC has produced four “families” of emission scenarios. To explore some of these uncertainties, emissions scenarios A2 (a medium-high emissions) and B1 (low emissions) were selected from the current IPCC Fourth climate assessment, which provides several recent model simulations driven by A2 and B1 emissions. The global climate model simulations addressed here were from PCM1, the Parallel Climate Model from the National Center for Atmospheric Research (NCAR) and U.S. Department of Energy (DOE) group, and CM2.1 from the National Oceanic and Atmospheric Administration (NOAA) Geophysical Fluids Dynamics Laboratory (GFDL).
Atmospheric CO2 variations over the last three glacial-interglacial climatic cycles deduced from the Dome Fuji deep ice core, Antarctica using a wet extraction technique

International Nuclear Information System (INIS)

Kawamura, Kenji; Nakazawa, Takakiyo; Aoki, Shuji

2003-01-01

A deep ice core drilled at Dome Fuji, East Antarctica was analyzed for the CO 2 concentration using a wet extraction method in order to reconstruct its atmospheric variations over the past 320 kyr, which includes three full glacial-interglacial climatic cycles, with a mean time resolution of about 1.1 kyr. The CO 2 concentration values derived for the past 65 kyr are very close to those obtained from other Antarctic ice cores using dry extraction methods, although the wet extraction method is generally thought to be inappropriate for the determination of the CO 2 concentration. The comparison between the CO 2 and Ca 2+ concentrations deduced from the Dome Fuji core suggests that calcium carbonate emitted from lands was mostly neutralized in the atmosphere before reaching the central part of Antarctica, or that only a small part of calcium carbonate was involved in CO 2 production during the wet extraction process. The CO 2 concentration for the past 320 kyr deduced from the Dome Fuji core varies between 190 and 300 ppmv, showing clear glacial-interglacial variations similar to the result of the Vostok ice core. However, for some periods, the concentration values of the Dome Fuji core are higher by up to 20 ppmv than those of the Vostok core. There is no clear indication that such differences are related to variations of chemical components of Ca 2+ , microparticle and acidity of the Dome Fuji core
Climatic projections and socio economic impacts of the climatic change in Colombia

International Nuclear Information System (INIS)

Eslava R, Jesus Antonio; Pabon Caicedo, Jose Daniel

2001-01-01

For the task of working out climate change projections, different methodologies have been in use, from simple extrapolations to sophisticated statistical and mathematical tools. Today, the tools most used are the models of the general circulation of the atmosphere and ocean, which include many processes of other climate components (biosphere, cryosphere, continental surface models, etc.). Different global and regional scenarios have been generated with those models. They may be of great utility in calculating projections and future scenarios for Colombia, but the representation of the country's climate in those models has to be improved in order to get projections with a higher level of certainty. The application of climate models and of the techniques of down scaling in studies of climate change is new both in Colombia and tropical America, and was introduced through the National University of Colombia's project on local and national climate change. In the first phase of the project, version 3 of the CCM (Climate Community Model) of NCAR was implemented. Parallel to that, and based on national (grid) data, maps have been prepared of the monthly temperature and precipitation of Colombia, which were used to validate the model
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

Science.gov (United States)

Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Simulation of the present-day climate with the climate model INMCM5

Science.gov (United States)

Volodin, E. M.; Mortikov, E. V.; Kostrykin, S. V.; Galin, V. Ya.; Lykossov, V. N.; Gritsun, A. S.; Diansky, N. A.; Gusev, A. V.; Iakovlev, N. G.

2017-12-01

In this paper we present the fifth generation of the INMCM climate model that is being developed at the Institute of Numerical Mathematics of the Russian Academy of Sciences (INMCM5). The most important changes with respect to the previous version (INMCM4) were made in the atmospheric component of the model. Its vertical resolution was increased to resolve the upper stratosphere and the lower mesosphere. A more sophisticated parameterization of condensation and cloudiness formation was introduced as well. An aerosol module was incorporated into the model. The upgraded oceanic component has a modified dynamical core optimized for better implementation on parallel computers and has two times higher resolution in both horizontal directions. Analysis of the present-day climatology of the INMCM5 (based on the data of historical run for 1979-2005) shows moderate improvements in reproduction of basic circulation characteristics with respect to the previous version. Biases in the near-surface temperature and precipitation are slightly reduced compared with INMCM4 as well as biases in oceanic temperature, salinity and sea surface height. The most notable improvement over INMCM4 is the capability of the new model to reproduce the equatorial stratospheric quasi-biannual oscillation and statistics of sudden stratospheric warmings.
The ICON-1.2 hydrostatic atmospheric dynamical core on triangular grids – Part 1: Formulation and performance of the baseline version

Directory of Open Access Journals (Sweden)

H. Wan

2013-06-01

Full Text Available As part of a broader effort to develop next-generation models for numerical weather prediction and climate applications, a hydrostatic atmospheric dynamical core is developed as an intermediate step to evaluate a finite-difference discretization of the primitive equations on spherical icosahedral grids. Based on the need for mass-conserving discretizations for multi-resolution modelling as well as scalability and efficiency on massively parallel computing architectures, the dynamical core is built on triangular C-grids using relatively small discretization stencils. This paper presents the formulation and performance of the baseline version of the new dynamical core, focusing on properties of the numerical solutions in the setting of globally uniform resolution. Theoretical analysis reveals that the discrete divergence operator defined on a single triangular cell using the Gauss theorem is only first-order accurate, and introduces grid-scale noise to the discrete model. The noise can be suppressed by fourth-order hyper-diffusion of the horizontal wind field using a time-step and grid-size-dependent diffusion coefficient, at the expense of stronger damping than in the reference spectral model. A series of idealized tests of different complexity are performed. In the deterministic baroclinic wave test, solutions from the new dynamical core show the expected sensitivity to horizontal resolution, and converge to the reference solution at R2B6 (35 km grid spacing. In a dry climate test, the dynamical core correctly reproduces key features of the meridional heat and momentum transport by baroclinic eddies. In the aqua-planet simulations at 140 km resolution, the new model is able to reproduce the same equatorial wave propagation characteristics as in the reference spectral model, including the sensitivity of such characteristics to the meridional sea surface temperature profile. These results suggest that the triangular-C discretization provides a
3D visualization of ultra-fine ICON climate simulation data

Science.gov (United States)

Röber, Niklas; Spickermann, Dela; Böttinger, Michael

2016-04-01

Advances in high performance computing and model development allow the simulation of finer and more detailed climate experiments. The new ICON model is based on an unstructured triangular grid and can be used for a wide range of applications, ranging from global coupled climate simulations down to very detailed and high resolution regional experiments. It consists of an atmospheric and an oceanic component and scales very well for high numbers of cores. This allows us to conduct very detailed climate experiments with ultra-fine resolutions. ICON is jointly developed in partnership with DKRZ by the Max Planck Institute for Meteorology and the German Weather Service. This presentation discusses our current workflow for analyzing and visualizing this high resolution data. The ICON model has been used for eddy resolving (developed specific plugins for the free available visualization software ParaView and Vapor, which allows us to read and handle that much data. Within ParaView, we can additionally compare prognostic variables with performance data side by side to investigate the performance and scalability of the model. With the simulation running in parallel on several hundred nodes, an equal load balance is imperative. In our presentation we show visualizations of high-resolution ICON oceanographic and HDCP2 atmospheric simulations that were created using ParaView and Vapor. Furthermore we discuss our current efforts to improve our visualization capabilities, thereby exploring the potential of regular in-situ visualization, as well as of in-situ compression / post visualization.
Oxytocin: parallel processing in the social brain?

Science.gov (United States)

Dölen, Gül

2015-06-01

Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.
Greenland climate change

DEFF Research Database (Denmark)

Masson-Delmotte, Valérie; Swingedouw, Didier; Landais, Amaëlle

2012-01-01

Climate archives available from deep-sea and marine shelf sediments, glaciers, lakes and ice cores in and around Greenland allow us to place the current trends in regional climate, ice sheet dynamics, and land surface changes in a broader perspective. We show that during the last decade (2000s......), atmospheric and sea-surface temperatures are reaching levels last encountered millennia ago when northern high latitude summer insolation was higher due to a different orbital configuration. Concurrently, records from lake sediments in southern Greenland document major environmental and climatic conditions...... regional climate and ice sheet dynamics. The magnitude and rate of future changes in Greenland temperature, in response to increasing greenhouse gas emissions, may be faster than any past abrupt events occurring under interglacial conditions. Projections indicate that within one century Greenland may...
Parallel multireference configuration interaction calculations on mini-beta-carotenes and beta-carotene.

Science.gov (United States)

Kleinschmidt, Martin; Marian, Christel M; Waletzke, Mirko; Grimme, Stefan

2009-01-28

We present a parallelized version of a direct selecting multireference configuration interaction (MRCI) code [S. Grimme and M. Waletzke, J. Chem. Phys. 111, 5645 (1999)]. The program can be run either in ab initio mode or as semiempirical procedure combined with density functional theory (DFT/MRCI). We have investigated the efficiency of the parallelization in case studies on carotenoids and porphyrins. The performance is found to depend heavily on the cluster architecture. While the speed-up on the older Intel Netburst technology is close to linear for up to 12-16 processes, our results indicate that it is not favorable to use all cores of modern Intel Dual Core or Quad Core processors simultaneously for memory intensive tasks. Due to saturation of the memory bandwidth, we recommend to run less demanding tasks on the latter architectures in parallel to two (Dual Core) or four (Quad Core) MRCI processes per node. The DFT/MRCI branch has been employed to study the low-lying singlet and triplet states of mini-n-beta-carotenes (n=3, 5, 7, 9) and beta-carotene (n=11) at the geometries of the ground state, the first excited triplet state, and the optically bright singlet state. The order of states depends heavily on the conjugation length and the nuclear geometry. The (1)B(u) (+) state constitutes the S(1) state in the vertical absorption spectrum of mini-3-beta-carotene but switches order with the 2 (1)A(g) (-) state upon excited state relaxation. In the longer carotenes, near degeneracy or even root flipping between the (1)B(u) (+) and (1)B(u) (-) states is observed whereas the 3 (1)A(g) (-) state is found to remain energetically above the optically bright (1)B(u) (+) state at all nuclear geometries investigated here. The DFT/MRCI method is seen to underestimate the absolute excitation energies of the longer mini-beta-carotenes but the energy gaps between the excited states are reproduced well. In addition to singlet data, triplet-triplet absorption energies are

Parallel multireference configuration interaction calculations on mini-β-carotenes and β-carotene

Science.gov (United States)

Kleinschmidt, Martin; Marian, Christel M.; Waletzke, Mirko; Grimme, Stefan

2009-01-01

We present a parallelized version of a direct selecting multireference configuration interaction (MRCI) code [S. Grimme and M. Waletzke, J. Chem. Phys. 111, 5645 (1999)]. The program can be run either in ab initio mode or as semiempirical procedure combined with density functional theory (DFT/MRCI). We have investigated the efficiency of the parallelization in case studies on carotenoids and porphyrins. The performance is found to depend heavily on the cluster architecture. While the speed-up on the older Intel Netburst technology is close to linear for up to 12-16 processes, our results indicate that it is not favorable to use all cores of modern Intel Dual Core or Quad Core processors simultaneously for memory intensive tasks. Due to saturation of the memory bandwidth, we recommend to run less demanding tasks on the latter architectures in parallel to two (Dual Core) or four (Quad Core) MRCI processes per node. The DFT/MRCI branch has been employed to study the low-lying singlet and triplet states of mini-n-β-carotenes (n =3, 5, 7, 9) and β-carotene (n =11) at the geometries of the ground state, the first excited triplet state, and the optically bright singlet state. The order of states depends heavily on the conjugation length and the nuclear geometry. The B1u+ state constitutes the S1 state in the vertical absorption spectrum of mini-3-β-carotene but switches order with the 2 A1g- state upon excited state relaxation. In the longer carotenes, near degeneracy or even root flipping between the B1u+ and B1u- states is observed whereas the 3 A1g- state is found to remain energetically above the optically bright B1u+ state at all nuclear geometries investigated here. The DFT/MRCI method is seen to underestimate the absolute excitation energies of the longer mini-β-carotenes but the energy gaps between the excited states are reproduced well. In addition to singlet data, triplet-triplet absorption energies are presented. For β-carotene, where these transition
Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

Science.gov (United States)

Doerner, Edgardo

2016-05-01

In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.
Failure mechanisms for compacted uranium oxide fuel cores

International Nuclear Information System (INIS)

Berghaus, D.G.; Peacock, H.B.

1980-01-01

Tension, compression, and shear tests were performed on test specimens of aluminum-clad, compacted powder fuel cores to determine failure mechanisms of the core material. The core, which consists of 70% uranium oxide in an aluminum matrix, frequently fails during post-extrusion drawing. Tests were conducted to various strain levels up to failure of the core. Sections were made of tested specimens to microscopically study initiation of failure. Two failure modes wee observed. Tensile failure mode is initiated by prior tensile failure of uranium oxide particles with the separation path strongly influenced by the arrangement of particles. Delamination mode consists of the separation of laminae formed during extrusion of tubes. Separation proceeds from fine cracks formed parallel to the laminae. Tensile failure mode was experienced in tension and shear tests. Delamination mode was produced in compression tests
Performance and advantages of a soft-core based parallel architecture for energy peak detection in the calorimeter Level 0 trigger for the NA62 experiment at CERN

International Nuclear Information System (INIS)

Ammendola, R.; Barbanera, M.; Bizzarri, M.; Bonaiuto, V.; Ceccucci, A.; Simone, N. De; Fantechi, R.; Fucci, A.; Lupi, M.; Ryjov, V.; Checcucci, B.; Papi, A.; Piccini, M.; Federici, L.; Paoluzzi, G.; Salamon, A.; Salina, G.; Sargeni, F.; Venditti, S.

2017-01-01

The NA62 experiment at CERN SPS has started its data-taking. Its aim is to measure the branching ratio of the ultra-rare decay K + → π + ν ν̅ . In this context, rejecting the background is a crucial topic. One of the main background to the measurement is represented by the K + → π + π 0 decay. In the 1-8.5 mrad decay region this background is rejected by the calorimetric trigger processor (Cal-L0). In this work we present the performance of a soft-core based parallel architecture built on FPGAs for the energy peak reconstruction as an alternative to an implementation completely founded on VHDL language.
Performance and advantages of a soft-core based parallel architecture for energy peak detection in the calorimeter Level 0 trigger for the NA62 experiment at CERN

Science.gov (United States)

Ammendola, R.; Barbanera, M.; Bizzarri, M.; Bonaiuto, V.; Ceccucci, A.; Checcucci, B.; De Simone, N.; Fantechi, R.; Federici, L.; Fucci, A.; Lupi, M.; Paoluzzi, G.; Papi, A.; Piccini, M.; Ryjov, V.; Salamon, A.; Salina, G.; Sargeni, F.; Venditti, S.

2017-03-01

The NA62 experiment at CERN SPS has started its data-taking. Its aim is to measure the branching ratio of the ultra-rare decay K+ → π+ν ν̅ . In this context, rejecting the background is a crucial topic. One of the main background to the measurement is represented by the K+ → π+π0 decay. In the 1-8.5 mrad decay region this background is rejected by the calorimetric trigger processor (Cal-L0). In this work we present the performance of a soft-core based parallel architecture built on FPGAs for the energy peak reconstruction as an alternative to an implementation completely founded on VHDL language.
Millennial-scale climate variability recorded by gamma logging curve in Chaidam Basin

International Nuclear Information System (INIS)

Yuan Linwang; Chen Ye; Liu Zechun

2000-01-01

Using a natural gamma-ray logging curve of Dacan-1 core to inverse paleo-climate changes in Chaidam Basin, the process of environmental change of the past 150,000 years has been revealed. He in rich events and D-O cycles were identified, and can be matched well with those recorded in Greedland ice core. It suggests that the GR curve can identify tectonic and climatic events, is a sensitive proxy indicator of environmental and climatic changes
Paleoceanographic records in the sedimentary cores from the middle Okinawa Trough

Institute of Scientific and Technical Information of China (English)

LIUYanguang; FUYunxia; DUDewen; MENGXianwei; LIANGRuicai; LITiegang; WUShiying

2003-01-01

Two gravity piston cores (Cores 155 and 180) involved in this study were collected from the middle Okinawa Trough. Stratigraphy of the two cores was divided and classified based on the features of planktonic foraminifera oxygen isotope changes together with depositional sequence,millennium-scale climatic event comparison, carbonate cycles and AMS14C dating. Some paleoclimatic information contained in sediments of these cores was extracted to discuss the paleoclimatic change rules and the short-time scale events presented in interglacial period. Analysis on the variation of oxygen isotope values in stage two shows that the middle part of the Okinawa Trough may have been affected by fresh water from the Yellow River and the Yangtze River during the Last Glacial Maximum (LGM). The oxygen isotope value oscillating ranges of the cores have verified that the marginal sea has an amplifying effect on climate changes.The δ13C of benthic foraminifera Uvigerina was lighter in the glacial period than that in the interglacial period, which indicates that the Paleo-Kuroshio's main stream moved eastward and its influence area decreased. According to the temperature difference during the “YD” period existing in Core 180 and other data, we can reach the conclusion that the climatic changes in the middle Okinawa Trough area were controlled by global climatic changes, but some regional factors had also considerable influence on the climatechanges. Some results in this paper support Fairbanks's point that the “YD” event was a brief stagnation of sea level rising during the global warming up procession. Moreover,the falling of sea level in the glacial period weakened the exchange between the bottom water of the Okinawa Trough and the deep water of the northwestern Pacific Ocean and resulted in low oxygen state of bottom water in this area.These procedures are the reasons for carbonate cycle in the Okinawa Trough area being consistent with the “Atlantic type”carbonate cycle.
Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

Science.gov (United States)

Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

2012-11-01

In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

Science.gov (United States)

Rogers, Timothy T.; McClelland, James L.

2014-01-01

This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…
CMS readiness for multi-core workload scheduling

Science.gov (United States)

Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.; Aftab Khan, F.; Letts, J.; Mason, D.; Verguilov, V.

2017-10-01

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
CMS Readiness for Multi-Core Workload Scheduling

Energy Technology Data Exchange (ETDEWEB)

Perez-Calero Yzquierdo, A. [Madrid, CIEMAT; Balcas, J. [Caltech; Hernandez, J. [Madrid, CIEMAT; Aftab Khan, F. [NCP, Islamabad; Letts, J. [UC, San Diego; Mason, D. [Fermilab; Verguilov, V. [CLMI, Sofia

2017-11-22

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
δ13C-CH4 in ice core samples

DEFF Research Database (Denmark)

Sperlich, Peter

Ice core records of δ13C-CH4 reflect the variability of CH4 biogeochemistry in response to climate change and show this system is far more complex than expected. The first part of this work is concerned with the development of analytical techniques that allow 1) precise referencing and 2) measure......Ice core records of δ13C-CH4 reflect the variability of CH4 biogeochemistry in response to climate change and show this system is far more complex than expected. The first part of this work is concerned with the development of analytical techniques that allow 1) precise referencing and 2......) measurements of δ13C-CH4 in ice core samples as is required when δ13C-CH4 records that are measured in several laboratories are merged for analysis. Both the referencing and measurement techniques have been compared to further laboratories which proofed the accuracy of the analytical systems. The second part...
Co-simulation of dynamic systems in parallel and serial model configurations

International Nuclear Information System (INIS)

Sweafford, Trevor; Yoon, Hwan Sik

2013-01-01

Recent advancement in simulation software and computation hardware make it realizable to simulate complex dynamic systems comprised of multiple submodels developed in different modeling languages. The so-called co-simulation enables one to study various aspects of a complex dynamic system with heterogeneous submodels in a cost-effective manner. Among several different model configurations for co-simulation, synchronized parallel configuration is regarded to expedite the simulation process by simulation multiple sub models concurrently on a multi core processor. In this paper, computational accuracies as well as computation time are studied for three different co-simulation frameworks : integrated, serial, and parallel. for this purpose, analytical evaluations of the three different methods are made using the explicit Euler method and then they are applied to two-DOF mass-spring systems. The result show that while the parallel simulation configuration produces the same accurate results as the integrated configuration, results of the serial configuration, results of the serial configuration show a slight deviation. it is also shown that the computation time can be reduced by running simulation in the parallel configuration. Therefore, it can be concluded that the synchronized parallel simulation methodology is the best for both simulation accuracy and time efficiency.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.

Science.gov (United States)

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Data-Parallel Mesh Connected Components Labeling and Analysis

Energy Technology Data Exchange (ETDEWEB)

Harrison, Cyrus; Childs, Hank; Gaither, Kelly

2011-04-10

We present a data-parallel algorithm for identifying and labeling the connected sub-meshes within a domain-decomposed 3D mesh. The identification task is challenging in a distributed-memory parallel setting because connectivity is transitive and the cells composing each sub-mesh may span many or all processors. Our algorithm employs a multi-stage application of the Union-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling of connected sub-meshes. Marking each vertex with its corresponding sub-mesh label allows us to isolate mesh features based on topology, enabling new analysis capabilities. We briefly discuss two specific applications of the algorithm and present results from a weak scaling study. We demonstrate the algorithm at concurrency levels up to 2197 cores and analyze meshes containing up to 68 billion cells.
Providing the climatic component in human-climate interaction studies: 550,000 years of climate history in the Chew Bahir basin, a key HSPDP site in southern Ethiopia.

Science.gov (United States)

Foerster, V. E.; Asrat, A.; Bronk Ramsey, C.; Chapot, M. S.; Cohen, A. S.; Dean, J. R.; Deocampo, D.; Deino, A. L.; Guenter, C.; Junginger, A.; Lamb, H. F.; Leng, M. J.; Roberts, H. M.; Schaebitz, F.; Trauth, M. H.

2017-12-01

As a contribution towards an enhanced understanding of human-climate interactions, the Hominin Sites and Paleolakes Drilling Project (HSPDP) has cored six predominantly lacustrine archives of climate change spanning much of the last 3.5 Ma in eastern Africa. All six sites in Ethiopia and Kenya are adjacent to key paleoanthropological sites encompassing diverse milestones in human evolution, dispersal, and technological innovation. The 280 m-long Chew Bahir sediment core, recovered from a tectonically-bound basin in the southern Ethiopian rift in late 2014, covers the past 550 ka of environmental history, an interval marked by intense climatic changes and includes the transition to the Middle Stone Age and the origin and dispersal of modern Homo sapiens. We present the outcome of lithologic and stratigraphic investigations, first interpretations of high resolution MSCL and XRF scanning data, and initial results of detailed multi-indicator analysis of the Chew Bahir cores. These analyses are based on more than 14,000 discrete samples, including grain size analyses and X-ray diffraction. An initial chronology, based on Ar/Ar and OSL dating, allows temporal calibration of our reconstruction of dry-wet cycles. Both geochemical and sedimentological data show that the Chew Bahir deposits are sensitive recorders of climate change on millennial to centennial timescales. Initial statistical analyses identify phases marked by abrupt climatic changes, whereas several long-term wet-dry oscillations reveal variations mostly in the precession ( 15-25 kyr), but also in the obliquity ( 40 kyr) and eccentricity frequency bands ( 90-120 kyr). The Chew Bahir record will help decode climate variation on several different time scales, as a consequence of orbitally-driven high-latitude glacial-interglacial shifts and variations in greenhouse gases, Indian and Atlantic Ocean sea-surface temperatures, as well as local solar irradiance. This 550 ka record of environmental change in eastern
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

Science.gov (United States)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
A framework for grand scale parallelization of the combined finite discrete element method in 2d

Science.gov (United States)

Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.

2014-09-01

Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
Investigation of two-phase flow instability under SMART-P core conditions

International Nuclear Information System (INIS)

Hwang, Dae Hyun; Lee, Chung Chan

2005-01-01

An integral-type advanced light water reactor, named SMART-P, is being continuously studied at KAERI. The reactor core consists of hundreds of closed-channel type fuel assemblies with vertical upward flows. The upper and lower parts of the fuel assembly channels are connected to the common heads. The constant pressure drop imposed on the channel is responsible for the occurrence of density wave oscillations under local boiling and/or natural circulation conditions. The fuel assembly channel with oscillatory flow is highly susceptible to experience the CHF which may cause the fuel failure due to a sudden increase of the cladding temperature. Thus, prevention of the flow instability is an important criterion for the SMART-P core design. Experimental and analytical studies have been conducted in order to investigate the onset of flow instability (OFI) under SMART core conditions. The parallel channel oscillations were observed in a high pressure water-loop test facility. A linear stability analysis model in the frequency-domain was developed for the prediction of the marginal stability boundary (MSB) in the parallel boiling channels
Visualization and Analysis of Climate Simulation Performance Data

Science.gov (United States)

Röber, Niklas; Adamidis, Panagiotis; Behrens, Jörg

2015-04-01

Visualization is the key process of transforming abstract (scientific) data into a graphical representation, to aid in the understanding of the information hidden within the data. Climate simulation data sets are typically quite large, time varying, and consist of many different variables sampled on an underlying grid. A large variety of climate models - and sub models - exist to simulate various aspects of the climate system. Generally, one is mainly interested in the physical variables produced by the simulation runs, but model developers are also interested in performance data measured along with these simulations. Climate simulation models are carefully developed complex software systems, designed to run in parallel on large HPC systems. An important goal thereby is to utilize the entire hardware as efficiently as possible, that is, to distribute the workload as even as possible among the individual components. This is a very challenging task, and detailed performance data, such as timings, cache misses etc. have to be used to locate and understand performance problems in order to optimize the model implementation. Furthermore, the correlation of performance data to the processes of the application and the sub-domains of the decomposed underlying grid is vital when addressing communication and load imbalance issues. High resolution climate simulations are carried out on tens to hundreds of thousands of cores, thus yielding a vast amount of profiling data, which cannot be analyzed without appropriate visualization techniques. This PICO presentation displays and discusses the ICON simulation model, which is jointly developed by the Max Planck Institute for Meteorology and the German Weather Service and in partnership with DKRZ. The visualization and analysis of the models performance data allows us to optimize and fine tune the model, as well as to understand its execution on the HPC system. We show and discuss our workflow, as well as present new ideas and

Radiation-hard/high-speed parallel optical links

Energy Technology Data Exchange (ETDEWEB)

Gan, K.K., E-mail: gan@mps.ohio-state.edu [Department of Physics, The Ohio State University, Columbus, OH 43210 (United States); Buchholz, P.; Heidbrink, S. [Fachbereich Physik, Universität Siegen, Siegen (Germany); Kagan, H.P.; Kass, R.D.; Moore, J.; Smith, D.S. [Department of Physics, The Ohio State University, Columbus, OH 43210 (United States); Vogt, M.; Ziolkowski, M. [Fachbereich Physik, Universität Siegen, Siegen (Germany)

2016-09-21

We have designed and fabricated a compact parallel optical engine for transmitting data at 5 Gb/s. The device consists of a 4-channel ASIC driving a VCSEL (Vertical Cavity Surface Emitting Laser) array in an optical package. The ASIC is designed using only core transistors in a 65 nm CMOS process to enhance the radiation-hardness. The ASIC contains an 8-bit DAC to control the bias and modulation currents of the individual channels in the VCSEL array. The performance of the optical engine up at 5 Gb/s is satisfactory.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

Science.gov (United States)

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Hybrid shared/distributed parallelism for 3D characteristics transport solvers

International Nuclear Information System (INIS)

Dahmani, M.; Roy, R.

2005-01-01

In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)
Flexibility and Performance of Parallel File Systems

Science.gov (United States)

Kotz, David; Nieuwejaar, Nils

1996-01-01

As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.
Comparative analysis of the serial/parallel numerical calculation of boiling channels thermohydraulics; Analisis comparativo del calculo numerico serie/paralelo de la termohidraulica de canales con ebullicion

Energy Technology Data Exchange (ETDEWEB)

Cecenas F, M., E-mail: mcf@iie.org.mx [Instituto Nacional de Electricidad y Energias Limpias, Reforma 113, Col. Palmira, 62490 Cuernavaca, Morelos (Mexico)

2017-09-15

A parallel channel model with boiling and punctual neutron kinetics is used to compare the implementation of its programming in C language through a conventional scheme and through a parallel programming scheme. In both cases the subroutines written in C are practically the same, but they vary in the way of controlling the execution of the tasks that calculate the different channels. Parallel Virtual Machine is used for the parallel solution, which allows the passage of messages between tasks to control convergence and transfer the variables of interest between the tasks that run simultaneously on a platform equipped with a multi-core microprocessor. For some problems defined as a study case, such as the one presented in this paper, a computer with two cores can reduce the computation time to 54-56% of the time required by the same program in its conventional sequential version. Similarly, a processor with four cores can reduce the time to 22-33% of execution time of the conventional serial version. These results of substantially reducing the computation time are very motivating of all those applications that can be prepared to be parallelized and whose execution time is an important factor. (Author)
Denali Ice Core Record of North Pacific Sea Surface Temperatures and Marine Primary Productivity

Science.gov (United States)

Polashenski, D.; Osterberg, E. C.; Kreutz, K. J.; Winski, D.; Wake, C. P.; Ferris, D. G.; Introne, D.; Campbell, S. W.

2016-12-01

Chemical analyses of precipitation preserved in glacial ice cores provide a unique opportunity to study changes in atmospheric circulation patterns and ocean surface conditions through time. In this study, we aim to investigate changes in both the physical and biological parameters of the north-central Pacific Ocean and Bering Sea over the twentieth century using the deuterium excess (d-excess) and methanesulfonic acid (MSA) records from the Mt. Hunter ice cores drilled in Denali National Park, Alaska. These parallel, 208 m-long ice cores were drilled to bedrock during the 2013 field season on the Mt. Hunter plateau (63° N, 151° W, 3,900 m above sea level) by a collaborative research team consisting of members from Dartmouth College and the Universities of Maine and New Hampshire. The cores were sampled on a continuous melter system at Dartmouth College and analyzed for the concentrations major ions (Dionex IC) and trace metals (Element2 ICPMS), and for stable water isotope ratios (Picarro). The depth-age scale has been accurately dated to 400 AD using annual layer counting of several chemical species and further validated using known historical volcanic eruptions and the Cesium-137 spike associated with nuclear weapons testing in 1963. We use HYSPLIT back trajectory modeling to identify likely source areas of moisture and aerosol MSA being transported to the core site. Satellite imagery allows for a direct comparison between chlorophyll a concentrations in these source areas and MSA concentrations in the core record. Preliminary analysis of chlorophyll a and MSA concentrations, both derived almost exclusively from marine biota, suggest that the Mt. Hunter ice cores reflect changes in North Pacific and Bering Sea marine primary productivity. Analysis of the water isotope and MSA data in conjunction with climate reanalysis products shows significant correlations (psea surface temperatures in the Bering Sea and North Central Pacific. These findings, coupled with
Physics Structure Analysis of Parallel Waves Concept of Physics Teacher Candidate

International Nuclear Information System (INIS)

Sarwi, S; Linuwih, S; Supardi, K I

2017-01-01

The aim of this research was to find a parallel structure concept of wave physics and the factors that influence on the formation of parallel conceptions of physics teacher candidates. The method used qualitative research which types of cross-sectional design. These subjects were five of the third semester of basic physics and six of the fifth semester of wave course students. Data collection techniques used think aloud and written tests. Quantitative data were analysed with descriptive technique-percentage. The data analysis technique for belief and be aware of answers uses an explanatory analysis. Results of the research include: 1) the structure of the concept can be displayed through the illustration of a map containing the theoretical core, supplements the theory and phenomena that occur daily; 2) the trend of parallel conception of wave physics have been identified on the stationary waves, resonance of the sound and the propagation of transverse electromagnetic waves; 3) the influence on the parallel conception that reading textbooks less comprehensive and knowledge is partial understanding as forming the structure of the theory. (paper)
A scalable implementation of RI-SCF on parallel computers

International Nuclear Information System (INIS)

Fruechtl, H.A.; Kendall, R.A.; Harrison, R.J.

1996-01-01

In order to avoid the integral bottleneck of conventional SCF calculations, the Resolution of the Identity (RI) method is used to obtain an approximate solution to the Hartree-Fock equations. In this approximation only three-center integrals are needed to build the Fock matrix. It has been implemented as part of the NWChem package of portable and scalable ab initio programs for parallel computers. Utilizing the V-approximation, both the Coulomb and exchange contribution to the Fock matrix can be calculated from a transformed set of three-center integrals which have to be precalculated and stored. A distributed in-core method as well as a disk based implementation have been programmed. Details of the implementation as well as the parallel programming tools used are described. We also give results and timings from benchmark calculations
Heat remains unaccounted for in thermal physiology and climate change research [version 2; referees: 2 approved

Directory of Open Access Journals (Sweden)

Andreas D. Flouris

2017-03-01

Full Text Available In the aftermath of the Paris Agreement, there is a crucial need for scientists in both thermal physiology and climate change research to develop the integrated approaches necessary to evaluate the health, economic, technological, social, and cultural impacts of 1.5°C warming. Our aim was to explore the fidelity of remote temperature measurements for quantitatively identifying the continuous redistribution of heat within both the Earth and the human body. Not accounting for the regional distribution of warming and heat storage patterns can undermine the results of thermal physiology and climate change research. These concepts are discussed herein using two parallel examples: the so-called slowdown of the Earth’s surface temperature warming in the period 1998-2013; and the controversial results in thermal physiology, arising from relying heavily on core temperature measurements. In total, the concept of heat is of major importance for the integrity of systems, such as the Earth and human body. At present, our understanding about the interplay of key factors modulating the heat distribution on the surface of the Earth and in the human body remains incomplete. Identifying and accounting for the interconnections among these factors will be instrumental in improving the accuracy of both climate models and health guidelines.
ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

KAUST Repository

Sahli, Majed; Mansour, Essam; Kalnis, Panos

2014-01-01

-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency
A High-Resolution Continuous Flow Analysis System for Polar Ice Cores

DEFF Research Database (Denmark)

Dallmayr, Remi; Goto-Azuma, Kumiko; Kjær, Helle Astrid

2016-01-01

of Polar Research (NIPR) in Tokyo. The system allows the continuous analysis of stable water isotopes and electrical conductivity, as well as the collection of discrete samples from both inner and outer parts of the core. This CFA system was designed to have sufficiently high temporal resolution to detect...... signals of abrupt climate change in deep polar ice cores. To test its performance, we used the system to analyze different climate intervals in ice drilled at the NEEM (North Greenland Eemian Ice Drilling) site, Greenland. The quality of our continuous measurement of stable water isotopes has been......In recent decades, the development of continuous flow analysis (CFA) technology for ice core analysis has enabled greater sample throughput and greater depth resolution compared with the classic discrete sampling technique. We developed the first Japanese CFA system at the National Institute...
The Destabilization of Protected Soil Organic Carbon Following Experimental Drought at the Pore and Core scale

Science.gov (United States)

Smith, A. P.; Bond-Lamberty, B. P.; Tfaily, M. M.; Todd-Brown, K. E.; Bailey, V. L.

2015-12-01

The movement of water and solutes through the pore matrix controls the distribution and transformation of carbon (C) in soils. Thus, a change in the hydrologic connectivity, such as increased saturation, disturbance or drought, may alter C mineralization and greenhouse gas (GHG) fluxes to the atmosphere. While these processes occur at the pore scale, they are often investigated at coarser scale. This project investigates pore- and core-scale soil C dynamics with varying hydrologic factors (simulated precipitation, groundwater-led saturation, and drought) to assess how climate-change induced shifts in hydrologic connectivity influences the destabilization of protected C in soils. Surface soil cores (0-15 cm depth) were collected from the Disney Wilderness Preserve, Florida, USA where water dynamics, particularly water table rise and fall, appear to exert a strong control on the emissions of GHGs and the persistence of soil organic matter in these soils. We measured CO2 and CH4 from soils allowed to freely imbibe water from below to a steady state starting from either field moist conditions or following experimental drought. Parallel treatments included the addition of similar quantities of water from above to simulate precipitation. Overall respiration increased in soil cores subjected to drought compared to field moist cores independent of wetting type. Cumulative CH4 production was higher in drought-induced soils, especially in the soils subjected to experimental groundwater-led saturation. Overall, the more C (from CO2 and CH4) was lost in drought-induced soils compared to field moist cores. Our results indicate that future drought events could have profound effects on the destabilization of protected C, especially in groundwater-fed soils. Our next steps focus on how to accurately capture drought-induced C destabilization mechanisms in earth system models.
Synthetic Aperture Sequential Beamforming implemented on multi-core platforms

DEFF Research Database (Denmark)

Kjeldsen, Thomas; Lassen, Lee; Hemmsen, Martin Christian

2014-01-01

This paper compares several computational ap- proaches to Synthetic Aperture Sequential Beamforming (SASB) targeting consumer level parallel processors such as multi-core CPUs and GPUs. The proposed implementations demonstrate that ultrasound imaging using SASB can be executed in real- time with ...... per second) on an Intel Core i7 2600 CPU with an AMD HD7850 and a NVIDIA GTX680 GPU. The fastest CPU and GPU implementations use 14% and 1.3% of the real-time budget of 62 ms/frame, respectively. The maximum achieved processing rate is 1265 frames/s....
Parallelization of the preconditioned IDR solver for modern multicore computer systems

Science.gov (United States)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Just-in-Time Compilation-Inspired Methodology for Parallelization of Compute Intensive Java Code

Directory of Open Access Journals (Sweden)

GHULAM MUSTAFA

2017-01-01

Full Text Available Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system
Just-in-time compilation-inspired methodology for parallelization of compute intensive java code

International Nuclear Information System (INIS)

Mustafa, G.; Ghani, M.U.

2017-01-01

Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time) compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum) benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system. (author)
Design, fabrication and characterization of a micro-fluxgate intended for parallel robot application

Science.gov (United States)

Kirchhoff, M. R.; Bogdanski, G.; Büttgenbach, S.

2009-05-01

This paper presents a micro-magnetometer based on the fluxgate principle. Fluxgates detect the magnitude and direction of DC and low-frequency AC magnetic fields. The detectable flux density typically ranges from several 10 nT to about 1 mT. The introduced fluxgate sensor is fabricated using MEMS-technologies, basically UV depth lithography and electroplating for manufacturing high aspect ratio structures. It consists of helical copper coils around a soft magnetic nickel-iron (NiFe) core. The core is designed in so-called racetrack geometry, whereby the directional sensitivity of the sensor is considerably higher compared to common ring-core fluxgates. The electrical operation is based on analyzing the 2nd harmonic of the AC output signal. Configuration, manufacturing and selected characteristics of the fluxgate magnetometer are discussed in this work. The fluxgate builds the basis of an innovative angular sensor system for a parallel robot with HEXA-structure. Integrated into the passive joints of the parallel robot, the fluxgates are combined with permanent magnets rotating on the joint shafts. The magnet transmits the angular information via its magnetic orientation. In this way, the angles between the kinematic elements are measured, which allows self-calibration of the robot and the fast analytical solution of direct kinematics for an advanced workspace monitoring.
An prediction and explanation of 'climatic swing

Science.gov (United States)

Barkin, Yury

2010-05-01

Introduction. In works of the author [1, 2] the mechanism has been offered and the scenario of formation of congelations and warming of the Earth and their inversion and asymmetric displays in opposite hemispheres has been described. These planetary thermal processes are connected with gravitational forced oscillations of the core-mantle system of the Earth, controlling and directing submission of heat in the top layers of the mantle and on a surface of the Earth. It is shown, that action of this mechanism should observed in various time scales. In particular significant changes of a climate should occur to the thousand-year periods, with the periods in tens and hundred thousand years. Thus excitation of system the core-mantle is caused by planetary secular orbital perturbations and by perturbations of the Earth rotation which as is known are characterized by significant amplitudes. But also in a short time scale the climate variations with the interannual and decade periods also should be observed, how dynamic consequences of the swing of the core-mantle system of the Earth with the same periods [3]. The fundamental phenomenon of secular polar drift of the core relatively to the viscous-elastic and changeable mantle [4] in last years has obtained convincing confirmations various geosciences. Reliable an attribute of influence of oscillations of the core on a variation of natural processes is their property of inversion when, for example, activity of process accrues in northern hemisphere and decreases in a southern hemisphere. Such contrast secular changes in northern and southern (N/S) hemispheres have been predicted on the base of geodynamic model [1] and revealed according to observations: from gravimetry measurements of a gravity [5]; in determination of a secular trend of a sea level, as global, and in northern and southern hemispheres [6, 7]; in redistribution of air masses [6, 8]; in geodetic measurements of changes of average radiuses of northern and
Magnetic Fields in the Massive Dense Cores of the DR21 Filament: Weakly Magnetized Cores in a Strongly Magnetized Filament

Energy Technology Data Exchange (ETDEWEB)

Ching, Tao-Chung; Lai, Shih-Ping [Institute of Astronomy and Department of Physics, National Tsing Hua University, Hsinchu 30013, Taiwan (China); Zhang, Qizhou; Girart, Josep M. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge MA 02138 (United States); Qiu, Keping [School of Astronomy and Space Science, Nanjing University, 163 Xianlin Avenue, Nanjing 210023 (China); Liu, Hauyu B., E-mail: chingtaochung@gmail.com [European Southern Observatory (ESO), Karl-Schwarzschild-Str. 2, D-85748 Garching (Germany)

2017-04-01

We present Submillimeter Array 880 μ m dust polarization observations of six massive dense cores in the DR21 filament. The dust polarization shows complex magnetic field structures in the massive dense cores with sizes of 0.1 pc, in contrast to the ordered magnetic fields of the parsec-scale filament. The major axes of the massive dense cores appear to be aligned either parallel or perpendicular to the magnetic fields of the filament, indicating that the parsec-scale magnetic fields play an important role in the formation of the massive dense cores. However, the correlation between the major axes of the cores and the magnetic fields of the cores is less significant, suggesting that during the core formation, the magnetic fields below 0.1 pc scales become less important than the magnetic fields above 0.1 pc scales in supporting a core against gravity. Our analysis of the angular dispersion functions of the observed polarization segments yields a plane-of-sky magnetic field strength of 0.4–1.7 mG for the massive dense cores. We estimate the kinematic, magnetic, and gravitational virial parameters of the filament and the cores. The virial parameters show that the gravitational energy in the filament dominates magnetic and kinematic energies, while the kinematic energy dominates in the cores. Our work suggests that although magnetic fields may play an important role in a collapsing filament, the kinematics arising from gravitational collapse must become more important than magnetic fields during the evolution from filaments to massive dense cores.
Acting efficiently on climate change

International Nuclear Information System (INIS)

Appert, Olivier; Moncomble, Jean-Eudes

2015-01-01

Climate change is a major issue. A survey of the utility companies that account for 80% of the world's electric power was released during the 20. climate conference in Lima as part of the World Energy Council' Global Electricity Initiative. It has concluded that all these utilities see climate change as being real and declare that policies for adapting to it are as important as policies for limiting it. Nonetheless, 97% of these utilities think that consumers will refuse to pay more for decarbonized electricity. This is the core problem in the fight against climate change: all agree that the issue is urgent, some agree about what should be done, but none wants to pay

Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction

KAUST Repository

Ltaief, Hatem

2012-01-01

The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system. © 2012 Springer-Verlag.
Study of core support barrel vibration monitoring using ex-core neutron noise analysis and fuzzy logic algorithm

International Nuclear Information System (INIS)

Christian, Robby; Song, Seon Ho; Kang, Hyun Gook

2015-01-01

The application of neutron noise analysis (NNA) to the ex-core neutron detector signal for monitoring the vibration characteristics of a reactor core support barrel (CSB) was investigated. Ex-core flux data were generated by using a nonanalog Monte Carlo neutron transport method in a simulated CSB model where the implicit capture and Russian roulette technique were utilized. First and third order beam and shell modes of CSB vibration were modeled based on parallel processing simulation. A NNA module was developed to analyze the ex-core flux data based on its time variation, normalized power spectral density, normalized cross-power spectral density, coherence, and phase differences. The data were then analyzed with a fuzzy logic module to determine the vibration characteristics. The ex-core neutron signal fluctuation was directly proportional to the CSB's vibration observed at 8Hz and15Hzin the beam mode vibration, and at 8Hz in the shell mode vibration. The coherence result between flux pairs was unity at the vibration peak frequencies. A distinct pattern of phase differences was observed for each of the vibration models. The developed fuzzy logic module demonstrated successful recognition of the vibration frequencies, modes, orders, directions, and phase differences within 0.4 ms for the beam and shell mode vibrations.
Work-Efficient Parallel Skyline Computation for the GPU

DEFF Research Database (Denmark)

Bøgh, Kenneth Sejdenfaden; Chester, Sean; Assent, Ira

2015-01-01

offers the potential for parallelizing skyline computation across thousands of cores. However, attempts to port skyline algorithms to the GPU have prioritized throughput and failed to outperform sequential algorithms. In this paper, we introduce a new skyline algorithm, designed for the GPU, that uses...... a global, static partitioning scheme. With the partitioning, we can permit controlled branching to exploit transitive relationships and avoid most point-to-point comparisons. The result is a non-traditional GPU algorithm, SkyAlign, that prioritizes work-effciency and respectable throughput, rather than...
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

Directory of Open Access Journals (Sweden)

Jinwei Wang

2014-01-01

Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.
Climate Change: The Evidence and Our Options

Science.gov (United States)

Thompson, Lonnie G.

2010-01-01

Glaciers serve as early indicators of climate change. Over the last 35 years, our research team has recovered ice-core records of climatic and environmental variations from the polar regions and from low-latitude high-elevation ice fields from 16 countries. The ongoing widespread melting of high-elevation glaciers and ice caps, particularly in low…
Evaluating the scalability of HEP software and multi-core hardware

CERN Document Server

Jarp, S; Leduc, J; Nowak, A

2011-01-01

As researchers have reached the practical limits of processor performance improvements by frequency scaling, it is clear that the future of computing lies in the effective utilization of parallel and multi-core architectures. Since this significant change in computing is well underway, it is vital for HEP programmers to understand the scalability of their software on modern hardware and the opportunities for potential improvements. This work aims to quantify the benefit of new mainstream architectures to the HEP community through practical benchmarking on recent hardware solutions, including the usage of parallelized HEP applications.
In-core fuel management practice in HANARO

International Nuclear Information System (INIS)

Kim Hark Rho; Lee Choong Sung; Lee Jo Bok

1997-01-01

KAERI (KOREA Atomic Energy Research Institute) completed the system performance tests for the HANARO (Hi-flux Advanced Neutron Application Research Reactor) on December 1994. Its initial criticality was achieved on February 8, 1995. A variety of the reactor physics experiments were performed in parallel with configuring the first cycle core and now HANARO is in the third cycle operation. The in-core fuel management in HANARO is performed on the following strategy: 1) the cycle length of the equilibrium core is at least 4 week FPDs, 2) the maximum linear heat generation rate should be within the design limit, 3) the reactor should have shutdown margin of 1% Δk/k at minimum, 4) the available thermal flux should satisfy the users' requirements. This paper presents the fuel management practice in HANARO. Section II briefly describes the design feature of the HANARO and the method of analysis follows in section III and section IV describes In-core fuel management practice and the conclusion is remarked in the final section. (author)
From sequential to parallel programming with patterns

CERN Document Server

CERN. Geneva

2018-01-01

To increase in both performance and efficiency, our programming models need to adapt to better exploit modern processors. The classic idioms and patterns for programming such as loops, branches or recursion are the pillars of almost every code and are well known among all programmers. These patterns all have in common that they are sequential in nature. Embracing parallel programming patterns, which allow us to program for multi- and many-core hardware in a natural way, greatly simplifies the task of designing a program that scales and performs on modern hardware, independently of the used programming language, and in a generic way.
Computational models of stellar collapse and core-collapse supernovae

International Nuclear Information System (INIS)

Ott, Christian D; O'Connor, Evan; Schnetter, Erik; Loeffler, Frank; Burrows, Adam; Livne, Eli

2009-01-01

Core-collapse supernovae are among Nature's most energetic events. They mark the end of massive star evolution and pollute the interstellar medium with the life-enabling ashes of thermonuclear burning. Despite their importance for the evolution of galaxies and life in the universe, the details of the core-collapse supernova explosion mechanism remain in the dark and pose a daunting computational challenge. We outline the multi-dimensional, multi-scale, and multi-physics nature of the core-collapse supernova problem and discuss computational strategies and requirements for its solution. Specifically, we highlight the axisymmetric (2D) radiation-MHD code VULCAN/2D and present results obtained from the first full-2D angle-dependent neutrino radiation-hydrodynamics simulations of the post-core-bounce supernova evolution. We then go on to discuss the new code Zelmani which is based on the open-source HPC Cactus framework and provides a scalable AMR approach for 3D fully general-relativistic modeling of stellar collapse, core-collapse supernovae and black hole formation on current and future massively-parallel HPC systems. We show Zelmani's scaling properties to more than 16,000 compute cores and discuss first 3D general-relativistic core-collapse results.
cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Directory of Open Access Journals (Sweden)

Adelino R. Ferreira da Silva

2011-10-01

Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.
An innovative approach to undergraduate climate change education: Sustainability in the workplace

Science.gov (United States)

Robinson, Z. P.

2009-04-01

Climate change and climate science are a core component of environment-related degree programmes, but there are many programmes, for example business studies, that have clear linkages to climate change and sustainability issues which often have no or limited coverage of the subject. Although an in-depth coverage of climate science is not directly applicable to all programmes of study, the subject of climate change is of great relevance to all of society. Graduates from the higher education system are often viewed as society's ‘future leaders', hence it can be argued that it is important that all graduates are conversant in the issues of climate change and strategies for moving towards a sustainable future. Rather than an in depth understanding of climate science it may be more important that a wider range of students are educated in strategies for positive action. One aspect of climate change education that may be missing, including in programmes where climate change is a core topic, is practical strategies, skills and knowledge for reducing our impact on the climate system. This presentation outlines an innovative approach to undergraduate climate change education which focuses on the strategies for moving towards sustainability, but which is supported by climate science understanding taught within this context. Students gain knowledge and understanding of the motivations and strategies for businesses to improve their environmental performance, and develop skills in identifying areas of environmental improvement and recommending actions for change. These skills will allow students to drive positive change in their future careers. Such courses are relevant to students of all disciplines and can give the opportunity to students for whom climate change education is not a core part of their programme, to gain greater understanding of the issues and an awareness of practical changes that can be made at all levels to move towards a more sustainable society.
Digi-Clima Grid: image processing and distributed computing for recovering historical climate data

Directory of Open Access Journals (Sweden)

Sergio Nesmachnow

2015-12-01

Full Text Available This article describes the Digi-Clima Grid project, whose main goals are to design and implement semi-automatic techniques for digitalizing and recovering historical climate records applying parallel computing techniques over distributed computing infrastructures. The specific tool developed for image processing is described, and the implementation over grid and cloud infrastructures is reported. A experimental analysis over institutional and volunteer-based grid/cloud distributed systems demonstrate that the proposed approach is an efficient tool for recovering historical climate data. The parallel implementations allow to distribute the processing load, achieving accurate speedup values.
ANNarchy: a code generation approach to neural simulations on parallel hardware

Science.gov (United States)

Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

2015-01-01

Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Concept for Multi-cycle Nuclear Fuel Optimization Based On Parallel Simulated Annealing With Mixing of States

International Nuclear Information System (INIS)

Kropaczek, David J.

2008-01-01

A new concept for performing nuclear fuel optimization over a multi-cycle planning horizon is presented. The method provides for an implicit coupling between traditionally separate in-core and out-of-core fuel management decisions including determination of: fresh fuel batch size, enrichment and bundle design; exposed fuel reuse; and core loading pattern. The algorithm uses simulated annealing optimization, modified with a technique called mixing of states that allows for deployment in a scalable parallel environment. Analysis of algorithm performance for a transition cycle design (i.e. a PWR 6 month cycle length extension) demonstrates the feasibility of the approach as a production tool for fuel procurement and multi-cycle core design. (authors)
Large-scale parallel genome assembler over cloud computing environment.

Science.gov (United States)

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Climatic changes inferred fron analyses of lake-sediment cores, Walker Lake, Nevada

International Nuclear Information System (INIS)

Yang, In Che.

1989-01-01

Organic and inorganic fractions of sediment collected from the bottom of Walker Lake, Nevada, have been dated by carbon-14 techniques. Sedimentation rates and the organic-carbon content of the sediment were correlated with climatic change. The cold climate between 25,000 and 21,000 years ago caused little runoff, snow accumulation on the mountains, and rapid substantial glacial advances; this period of cold climate resulted in a slow sedimentation rate (0.20 millimeter per year) and in a small organic-carbon content in the sediment. Also, organic-carbon accumulation rates in the lake during this period were slow. The most recent period of slow sedimentation rate and small organic-carbon content occurred between 10,000 and 5500 years ago, indicative of low lake stage and dry climatic conditions. This period of dry climate also was evidenced by dry conditions for Lake Lahontan in Nevada and Searles Lake in California, as cited in the literature. Walker Lake filled rapidly with water between 5500 and 4500 years ago. The data published in this report was not produced under an approved Site Investigation Plan (SIP) or Study Plan (SP) and will not be used in the licensing process. 10 refs., 3 figs., 2 tabs
Political economy of climate change, ecological destruction and uneven development

International Nuclear Information System (INIS)

O'Hara, Phillip Anthony

2009-01-01

The purpose of this paper is to analyze climate change and ecological destruction through the prism of the core general principles of political economy. The paper starts with the principle of historical specificity, and the various waves of climate change through successive cooler and warmer periods on planet Earth, including the most recent climate change escalation through the open circuit associated with the treadmill of production. Then we scrutinize the principle of contradiction associated with the disembedded economy, social costs, entropy and destructive creation. The principle of uneven development is then explored through core-periphery dynamics, ecologically unequal exchange, metabolic rift and asymmetric global (in)justice. The principles of circular and cumulative causation (CCC) and uncertainty are then related to climate change dynamics through non-linear transformations, complex interaction of dominant variables, and threshold effects. Climate change and ecological destruction are impacting on most areas, especially the periphery, earlier and more intensely than previously thought likely. A political economy approach to climate change is able to enrich the analysis of ecological economics and put many critical themes in a broad context. (author)
PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

Energy Technology Data Exchange (ETDEWEB)

Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

2015-01-20

PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2³² = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.
Exchange of transverse plasmons and electrical conductivity of neutron star cores

International Nuclear Information System (INIS)

Shternin, P. S.

2008-01-01

We study the electrical conductivity in magnetized neutron star cores produced by collisions between charged particles. We take into account the ordinary exchange of longitudinal plasmons and the exchange of transverse plasmons in collisions between particles. The exchange of transverse plasmons is important for collisions between relativistic particles, but it has been disregarded previously when calculating the electrical conductivity. We show that taking this exchange into account changes the electrical conductivity, including its temperature dependence (thus, for example, the temperature dependence of the electrical resistivity along the magnetic field in the low-temperature limit takes the form R parallel ∝ T 5/3 instead of the standard dependence R parallel ∝ T 2 for degenerate Fermi systems). We briefly describe the effect of possible neutron and proton superfluidity in neutron star cores on the electrical conductivity and discuss various scenarios for the evolution of neutron star magnetic fields

Estimating Past Temperature Change in Antarctica Based on Ice Core Stable Water Isotope Diffusion

Science.gov (United States)

Kahle, E. C.; Markle, B. R.; Holme, C.; Jones, T. R.; Steig, E. J.

2017-12-01

The magnitude of the last glacial-interglacial transition is a key target for constraining climate sensitivity on long timescales. Ice core proxy records and general circulation models (GCMs) both provide insight on the magnitude of climate change through the last glacial-interglacial transition, but appear to provide different answers. In particular, the magnitude of the glacial-interglacial temperature change reconstructed from East Antarctic ice-core water-isotope records is greater ( 9 degrees C) than that from most GCM simulations ( 6 degrees C). A possible source of this difference is error in the linear-scaling of water isotopes to temperature. We employ a novel, nonlinear temperature-reconstruction technique using the physics of water-isotope diffusion to infer past temperature. Based on new, ice-core data from the South Pole, this diffusion technique suggests East Antarctic temperature change was smaller than previously thought. We are able to confirm this result using a simple, water-isotope fractionation model to nonlinearly reconstruct temperature change at ice core locations across Antarctica based on combined oxygen and hydrogen isotope ratios. Both methods produce a temperature change of 6 degrees C for South Pole, agreeing with GCM results for East Antarctica. Furthermore, both produce much larger changes in West Antarctica, also in agreement with GCM results and independent borehole thermometry. These results support the fidelity of GCMs in simulating last glacial maximum climate, and contradict the idea, based on previous work, that the climate sensitivity of current GCMs is too low.
Parallel Distributed Processing theory in the age of deep networks

OpenAIRE

Bowers, Jeffrey

2017-01-01

Parallel Distributed Processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely, that all knowledge is coded in a distributed format, and cognition is mediated by non-symbolic computations. These claims have long been debated within cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks le...
Concurrent, parallel, multiphysics coupling in the FACETS project

Energy Technology Data Exchange (ETDEWEB)

Cary, J R; Carlsson, J A; Hakim, A H; Kruger, S E; Miah, M; Pletzer, A; Shasharina, S [Tech-X Corporation, 5621 Arapahoe Avenue, Suite A, Boulder, CO 80303 (United States); Candy, J; Groebner, R J [General Atomics (United States); Cobb, J; Fahey, M R [Oak Ridge National Laboratory (United States); Cohen, R H; Epperly, T [Lawrence Livermore National Laboratory (United States); Estep, D J [Colorado State University (United States); Krasheninnikov, S [University of California at San Diego (United States); Malony, A D [ParaTools, Inc (United States); McCune, D C [Princeton Plasma Physics Laboratory (United States); McInnes, L; Balay, S [Argonne National Laboratory (United States); Pankin, A, E-mail: cary@txcorp.co [Lehigh University (United States)

2009-07-01

FACETS (Framework Application for Core-Edge Transport Simulations), is now in its third year. The FACETS team has developed a framework for concurrent coupling of parallel computational physics for use on Leadership Class Facilities (LCFs). In the course of the last year, FACETS has tackled many of the difficult problems of moving to parallel, integrated modeling by developing algorithms for coupled systems, extracting legacy applications as components, modifying them to run on LCFs, and improving the performance of all components. The development of FACETS abides by rigorous engineering standards, including cross platform build and test systems, with the latter covering regression, performance, and visualization. In addition, FACETS has demonstrated the ability to incorporate full turbulence computations for the highest fidelity transport computations. Early indications are that the framework, using such computations, scales to multiple tens of thousands of processors. These accomplishments were a result of an interdisciplinary collaboration among computational physics, computer scientists and applied mathematicians on the team.
Intelligent spatial ecosystem modeling using parallel processors

International Nuclear Information System (INIS)

Maxwell, T.; Costanza, R.

1993-01-01

Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

Science.gov (United States)

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment.

Science.gov (United States)

Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

2018-01-01

Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.
A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations

KAUST Repository

Poulson, Jack

2013-05-02

A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique. © 2013 Society for Industrial and Applied Mathematics.
A GPU-paralleled implementation of an enhanced face recognition algorithm

Science.gov (United States)

Chen, Hao; Liu, Xiyang; Shao, Shuai; Zan, Jiguo

2013-03-01

Face recognition algorithm based on compressed sensing and sparse representation is hotly argued in these years. The scheme of this algorithm increases recognition rate as well as anti-noise capability. However, the computational cost is expensive and has become a main restricting factor for real world applications. In this paper, we introduce a GPU-accelerated hybrid variant of face recognition algorithm named parallel face recognition algorithm (pFRA). We describe here how to carry out parallel optimization design to take full advantage of many-core structure of a GPU. The pFRA is tested and compared with several other implementations under different data sample size. Finally, Our pFRA, implemented with NVIDIA GPU and Computer Unified Device Architecture (CUDA) programming model, achieves a significant speedup over the traditional CPU implementations.
Forces on zonal flows in tokamak core turbulence

International Nuclear Information System (INIS)

Hallatschek, K.; Itoh, K.

2005-01-01

The saturation of stationary zonal flows (ZF) in the core of a tokamak has been analyzed in numerical fluid turbulence computer studies. The model was chosen to properly represent the kinetic global plasma flows, i.e., undamped stationary toroidal or poloidal flows and Landau damped geodesic acoustic modes. Reasonable agreement with kinetic simulations in terms of magnitude of transport and occurrence of the Dimits shift was verified. Contrary to common perception, in the final saturated state of turbulence and ZFs, the customary perpendicular Reynolds stress continues to drive the ZFs. The force balance is established by the essentially quasilinear parallel Reynolds stress acting on the parallel return flows required by incompressibility. (author)
Experimental investigation of boiling-water nuclear-reactor parallel-channel effects during a postulated loss-of-coolant accident

International Nuclear Information System (INIS)

Conlon, W.M.; Lahey, R.T. Jr.

1982-12-01

This report describes an experimental study of the influence of parallel channel effects (PCE) on the distribution of emergency core spray cooling water in a Boiling Water Nuclear Reactor (BWR) following a postulated design basis loss of coolant accident (LCA). The experiments were conducted in a scaled test section in which the reactor coolant was simulated by Freon-114 at conditions similar to those postulated to occur in the reactor vessel shortly after a LOCA. A BWR/4 was simulated by a (PCE) test section which contained three parallel heated channels to simulate fuel assemblies; a core bypass channel, and a jet pump channel. The test section also inlcuded scaled regions to simulate the lower and upper plena, downcomer, and steam separation regions of a BWR. A series of nine transient experiments were conducted, in which the lower plenum vaporization rate and heater rod power were varied while the core spray flow rate was held constant to simulate that of a BWR/4. During these experiments the flow distribution and heat transfer phenomena were observed and measured
Parallel solutions of the two-group neutron diffusion equations

International Nuclear Information System (INIS)

Zee, K.S.; Turinsky, P.J.

1987-01-01

Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations
Automatic Thread-Level Parallelization in the Chombo AMR Library

Energy Technology Data Exchange (ETDEWEB)

Christen, Matthias; Keen, Noel; Ligocki, Terry; Oliker, Leonid; Shalf, John; Van Straalen, Brian; Williams, Samuel

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.
CMFD and GPU acceleration on method of characteristics for hexagonal cores

International Nuclear Information System (INIS)

Han, Yu; Jiang, Xiaofeng; Wang, Dezhong

2014-01-01

Highlights: • A merged hex-mesh CMFD method solved via tri-diagonal matrix inversion. • Alternative hardware acceleration of using inexpensive GPU. • A hex-core benchmark with solution to confirm two acceleration methods. - Abstract: Coarse Mesh Finite Difference (CMFD) has been widely adopted as an effective way to accelerate the source iteration of transport calculation. However in a core with hexagonal assemblies there are non-hexagonal meshes around the edges of assemblies, causing a problem for CMFD if the CMFD equations are still to be solved via tri-diagonal matrix inversion by simply scanning the whole core meshes in different directions. To solve this problem, we propose an unequal mesh CMFD formulation that combines the non-hexagonal cells on the boundary of neighboring assemblies into non-regular hexagonal cells. We also investigated the alternative hardware acceleration of using graphics processing units (GPU) with graphics card in a personal computer. The tool CUDA is employed, which is a parallel computing platform and programming model invented by the company NVIDIA for harnessing the power of GPU. To investigate and implement these two acceleration methods, a 2-D hexagonal core transport code using the method of characteristics (MOC) is developed. A hexagonal mini-core benchmark problem is established to confirm the accuracy of the MOC code and to assess the effectiveness of CMFD and GPU parallel acceleration. For this benchmark problem, the CMFD acceleration increases the speed 16 times while the GPU acceleration speeds it up 25 times. When used simultaneously, they provide a speed gain of 292 times
CMFD and GPU acceleration on method of characteristics for hexagonal cores

Energy Technology Data Exchange (ETDEWEB)

Han, Yu, E-mail: hanyu1203@gmail.com [School of Nuclear Science and Engineering, Shanghai Jiaotong University, Shanghai 200240 (China); Jiang, Xiaofeng [Shanghai NuStar Nuclear Power Technology Co., Ltd., No. 81 South Qinzhou Road, XuJiaHui District, Shanghai 200000 (China); Wang, Dezhong [School of Nuclear Science and Engineering, Shanghai Jiaotong University, Shanghai 200240 (China)

2014-12-15

Highlights: • A merged hex-mesh CMFD method solved via tri-diagonal matrix inversion. • Alternative hardware acceleration of using inexpensive GPU. • A hex-core benchmark with solution to confirm two acceleration methods. - Abstract: Coarse Mesh Finite Difference (CMFD) has been widely adopted as an effective way to accelerate the source iteration of transport calculation. However in a core with hexagonal assemblies there are non-hexagonal meshes around the edges of assemblies, causing a problem for CMFD if the CMFD equations are still to be solved via tri-diagonal matrix inversion by simply scanning the whole core meshes in different directions. To solve this problem, we propose an unequal mesh CMFD formulation that combines the non-hexagonal cells on the boundary of neighboring assemblies into non-regular hexagonal cells. We also investigated the alternative hardware acceleration of using graphics processing units (GPU) with graphics card in a personal computer. The tool CUDA is employed, which is a parallel computing platform and programming model invented by the company NVIDIA for harnessing the power of GPU. To investigate and implement these two acceleration methods, a 2-D hexagonal core transport code using the method of characteristics (MOC) is developed. A hexagonal mini-core benchmark problem is established to confirm the accuracy of the MOC code and to assess the effectiveness of CMFD and GPU parallel acceleration. For this benchmark problem, the CMFD acceleration increases the speed 16 times while the GPU acceleration speeds it up 25 times. When used simultaneously, they provide a speed gain of 292 times.
Acceleration of Blender Cycles Path-Tracing Engine Using Intel Many Integrated Core Architecture

OpenAIRE

Jaroš , Milan; Říha , Lubomír; Strakoš , Petr; Karásek , Tomáš; Vašatová , Alena; Jarošová , Marta; Kozubek , Tomáš

2015-01-01

Part 2: Algorithms; International audience; This paper describes the acceleration of the most computationally intensive kernels of the Blender rendering engine, Blender Cycles, using Intel Many Integrated Core architecture (MIC). The proposed parallelization, which uses OpenMP technology, also improves the performance of the rendering engine when running on multi-core CPUs and multi-socket servers. Although the GPU acceleration is already implemented in Cycles, its functionality is limited. O...
SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.

Science.gov (United States)

Meng, Jintao; Wang, Bingqiang; Wei, Yanjie; Feng, Shengzhong; Balaji, Pavan

2014-01-01

There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler.
The future of commodity computing and many-core versus the interests of HEP software

CERN Multimedia

CERN. Geneva

2012-01-01

As the mainstream computing world has shifted from multi-core to many-core platforms, the situation for software developers has changed as well. With the numerous hardware and software options available, choices balancing programmability and performance are becoming a significant challenge. The expanding multiplicative dimensions of performance offer a growing number of possibilities that need to be assessed and addressed on several levels of abstraction. This paper reviews the major tradeoffs forced upon the software domain by the changing landscape of parallel technologies – hardware and software alike. Recent developments, paradigms and techniques are considered with respect to their impact on the rather traditional HEP programming models. Other considerations addressed include aspects of efficiency and reasonably achievable targets for the parallelization of large scale HEP workloads.
A multiple-proxy approach to understanding rapid Holocene climate change in Southeast Greenland

Science.gov (United States)

Davin, S. H.; Bradley, R. S.; Balascio, N. L.; de Wet, G.

2012-12-01

The susceptibility of the Arctic to climate change has made it an excellent workshop for paleoclimatological research. Although there have been previous studies concerning climate variability carried out in the Arctic, there remains a critical dearth of knowledge due the limited number of high-resolution Holocene climate-proxy records available from this region. This gap skews our understanding of observed and predicted climate change, and fuels uncertainty both in the realms of science and policy. This study takes a comprehensive approach to tracking Holocene climate variability in the vicinity of Tasiilaq, Southeast Greenland using a ~5.6 m sediment core from Lower Sermilik Lake. An age-depth model for the core has been established using 8 radiocarbon dates, the oldest of which was taken at 4 m down core and has been been dated to approximately 6.2 kyr BP. The bottom meter of the core below the final radiocarbon date contains a transition from cobbles and coarse sand to organic-rich laminations, indicating the termination of direct glacial influence and therefore likely marking the end of the last glacial period in this region. The remainder of the core is similarly organic-rich, with light-to-dark brown laminations ranging from 0.5 -1 cm in thickness and riddled with turbidites. Using this core in tandem with findings from an on-site assessment of the geomorphic history of the locale we attempt to assess and infer the rapid climatic shifts associated with the Holocene on a sub-centennial scale. Such changes include the termination of the last glacial period, the Mid-Holocene Climatic Optimum, the Neoglacial Period, the Medieval Climatic Optimum, and the Little Ice Age. A multiple proxy approach including magnetic susceptibility, bulk organic geochemistry, elemental profiles acquired by XRF scanning, grain-size, and spectral data will be used to characterize the sediment and infer paleoclimate conditions. Additionally, percent biogenic silica by weight has been
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.

2012-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within Sci
A relationship between ion balance and the chemical compounds of salt inclusions found in the Greenland Ice Core Project and Dome Fuji ice cores

DEFF Research Database (Denmark)

Johnsen, Sigfus Johann; Dahl-Jensen, Dorthe; Steffensen, Jørgen Peder

2008-01-01

We have proposed a method of deducing the chemical compounds found in deep polar ice cores by analyzing the balance between six major ions (Cl-, NO3 -, SO4 2-, Na+, Mg2+, and Ca2+). The method is demonstrated for the Holocene and last glacial maximum regions of the Dome Fuji and GRIP ice cores...... on individual salt inclusions. The abundances in the ice cores are shown to reflect differences in climatic periods (the acidic environment of the Holocene versus the reductive environment of the last glacial maximum) and regional conditions (the marine environment of Antarctica versus the continental...

Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

Science.gov (United States)

With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...
Control rod drop transient analysis with the coupled parallel code pCTF-PARCSv2.7

International Nuclear Information System (INIS)

Ramos, Enrique; Roman, Jose E.; Abarca, Agustín; Miró, Rafael; Bermejo, Juan A.

2016-01-01

Highlights: • An MPI parallel version of the thermal–hydraulic subchannel code COBRA-TF has been developed. • The parallel code has been coupled to the 3D neutron diffusion code PARCSv2.7. • The new codes are validated with a control rod drop transient. - Abstract: In order to reduce the response time when simulating large reactors in detail, a parallel version of the thermal–hydraulic subchannel code COBRA-TF (CTF) has been developed using the standard Message Passing Interface (MPI). The parallelization is oriented to reactor cells, so it is best suited for models consisting of many cells. The generation of the Jacobian matrix is parallelized, in such a way that each processor is in charge of generating the data associated with a subset of cells. Also, the solution of the linear system of equations is done in parallel, using the PETSc toolkit. With the goal of creating a powerful tool to simulate the reactor core behavior during asymmetrical transients, the 3D neutron diffusion code PARCSv2.7 (PARCS) has been coupled with the parallel version of CTF (pCTF) using the Parallel Virtual Machine (PVM) technology. In order to validate the correctness of the parallel coupled code, a control rod drop transient has been simulated comparing the results with the real experimental measures acquired during an NPP real test.
A MULTI-CORE PARALLEL MOSAIC ALORITHM FOR MULTI-VIEW UAV IMAGES

Directory of Open Access Journals (Sweden)

X. Pan

2017-09-01

Full Text Available As the spread of the error and accumulation often lead to distortion or failure of image mosaic during the multi-view UAV (Unmanned Aerial Vehicle images stitching. In this paper, to solve the problem we propose a mosaic strategy to construct a mosaic ring and multi-level grouping parallel acceleration as an auxiliary. First, the input images will be divided into several groups, each group in the ring way to stitch. Then, use SIFT for matching, RANSAC to remove the wrong matching points. And then, calculate the perspective transformation matrix. Finally weaken the error by using the adjustment equation. All these steps run between different groups at the same time. By using real UAV images, the experiment results show that this method can effectively reduce the influence of accumulative error, improve the precision of mosaic and reduce the mosaic time by 60 %. The proposed method can be used as one of the effective ways to minimize the accumulative error.
High-resolution record of Northern Hemisphere climate extending into the last interglacial period

DEFF Research Database (Denmark)

North Greenland Ice Core Project members; Andersen, Katrine K.; Azuma, N.

2004-01-01

Two deep ice cores from central Greenland, drilled in the 1990s, have played a key role in climate reconstructions of the Northern Hemisphere, but the oldest sections of the cores were disturbed in chronology owing to ice folding near the bedrock. Here we present an undisturbed climate record from...... the initiation of the last glacial period. Our record reveals a hitherto unrecognized warm period initiated by an abrupt climate warming about 115,000 years ago, before glacial conditions were fully developed. This event does not appear to have an immediate Antarctic counterpart, suggesting that the climate see......-saw between the hemispheres (which dominated the last glacial period) was not operating at this time....
Coupling of unidimensional neutron kinetics to thermal hydraulics in parallel channels

International Nuclear Information System (INIS)

Cecenas F, M.; Campos G, R.M.

2003-01-01

In this work the dynamic behavior of a consistent system in fifteen channels in parallel that represent the reactor core of a BWR type, coupled of a kinetic neutronic model in one dimension is studied by means of time series. The arrangement of channels is obtained collapsing the assemblies that it consists the core to an arrangement of channels prepared in straight lines, and it is coupled to the unidimensional solution of the neutron diffusion equation. This solution represents the radial power distribution, and initially the static solution is obtained to verify that the one modeling core is critic. The coupled set nuclear-thermal hydraulics it is solved numerically by means of a net of CPUs working in the outline teacher-slave by means of Parallel Virtual Machine (PVM), subject to the restriction that the pressure drop is equal for each channel, which is executed iterating on the refrigerant distribution. The channels are dimensioned according to the one Stability Benchmark of the Ringhals swedish plant, organized by the Nuclear Energy Agency in 1994. From the information of this benchmark it is obtained the axial power profile for each channel, which is assumed as invariant in the time. To obtain the time series, the system gets excited with white noise (sequence that statistically obeys to a normal distribution with zero media), so that the power generated in each channel it possesses the same ones characteristics of a typical signal obtained by means of the acquisition of those signals of neutron flux in a BWR reactor. (Author)
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Directory of Open Access Journals (Sweden)

Mark James Abraham

2015-09-01

Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
Climate Justice in Rural Southeastern United States: A Review of Climate Change Impacts and Effects on Human Health.

Science.gov (United States)

Gutierrez, Kristie S; LePrevost, Catherine E

2016-02-03

Climate justice is a local, national, and global movement to protect at-risk populations who are disproportionately affected by climate change. The social context for this review is the Southeastern region of the United States, which is particularly susceptible to climate change because of the geography of the area and the vulnerabilities of the inhabiting populations. Negative human health effects on variable and vulnerable populations within the Southeast region due to changing climate are concerning, as health threats are not expected to produce parallel effects among all individuals. Vulnerable communities, such as communities of color, indigenous people, the geographically isolated, and those who are socioeconomically disadvantaged and already experiencing poor environmental quality, are least able to respond and adapt to climate change. Focusing on vulnerable populations in the Southeastern United States, this review is a synthesis of the recent (2010 to 2015) literature-base on the health effects connected to climate change. This review also addresses local and regional mitigation and adaptation strategies for citizens and leaders to combat direct and indirect human health effects related to a changing climate.
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Two-state ion heating at quasi-parallel shocks

International Nuclear Information System (INIS)

Thomsen, M.F.; Gosling, J.T.; Bame, S.J.; Onsager, T.G.; Russell, C.T.

1990-01-01

In a previous study of ion heating at quasi-parallel shocks, the authors showed a case in which the ion distributions downstream from the shock alternated between a cooler, denser, core/shoulder type and a hotter, less dense, more Maxwellian type. In this paper they further document the alternating occurrence of two different ion states downstream from several quasi-parallel shocks. Three separate lines of evidence are presented to show that the two states are not related in an evolutionary sense, but rather both are produced alternately at the shock: (1) the asymptotic downstream plasma parameters (density, ion temperature, and flow speed) are intermediate between those characterizing the two different states closer to the shock, suggesting that the asymptotic state is produced by a mixing of the two initial states; (2) examples of apparently interpenetrating (i.e., mixing) distributions can be found during transitions from one state to the other; and (3) examples of both types of distributions can be found at actual crossings of the shock ramp. The alternation between the two different types of ion distribution provides direct observational support for the idea that the dissipative dynamics of at least some quasi-parallel shocks is non-stationary and cyclic in nature, as demonstrated by recent numerical simulations. Typical cycle times between intervals of similar ion heating states are ∼2 upstream ion gyroperiods. Both the simulations and the in situ observations indicate that a process of coherent ion reflection is commonly an important part of the dissipation at quasi-parallel shocks
Stable water isotopes of precipitation and firn cores from the northern Antarctic Peninsula region as a proxy for climate reconstruction

Directory of Open Access Journals (Sweden)

F. Fernandoy

2012-03-01

Full Text Available In order to investigate the climate variability in the northern Antarctic Peninsula region, this paper focuses on the relationship between stable isotope content of precipitation and firn, and main meteorological variables (air temperature, relative humidity, sea surface temperature, and sea ice extent. Between 2008 and 2010, we collected precipitation samples and retrieved firn cores from several key sites in this region. We conclude that the deuterium excess oscillation represents a robust indicator of the meteorological variability on a seasonal to sub-seasonal scale. Low absolute deuterium excess values and the synchronous variation of both deuterium excess and air temperature imply that the evaporation of moisture occurs in the adjacent Southern Ocean. The δ¹⁸O-air temperature relationship is complicated and significant only at a (multiseasonal scale. Backward trajectory calculations show that air-parcels arriving at the region during precipitation events predominantly originate at the South Pacific Ocean and Bellingshausen Sea. These investigations will be used as a calibration for ongoing and future research in the area, suggesting that appropriate locations for future ice core research are located above 600 m a.s.l. We selected the Plateau Laclavere, Antarctic Peninsula as the most promising site for a deeper drilling campaign.
Hybrid MPI/OpenMP parallelization of the explicit Volterra integral equation solver for multi-core computer architectures

KAUST Repository

Al Jarro, Ahmed

2011-08-01

A hybrid MPI/OpenMP scheme for efficiently parallelizing the explicit marching-on-in-time (MOT)-based solution of the time-domain volume (Volterra) integral equation (TD-VIE) is presented. The proposed scheme equally distributes tested field values and operations pertinent to the computation of tested fields among the nodes using the MPI standard; while the source field values are stored in all nodes. Within each node, OpenMP standard is used to further accelerate the computation of the tested fields. Numerical results demonstrate that the proposed parallelization scheme scales well for problems involving three million or more spatial discretization elements. © 2011 IEEE.
Parallel rendering

Science.gov (United States)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Optimal task mapping in safety-critical real-time parallel systems; Placement optimal de taches pour les systemes paralleles temps-reel critiques

Energy Technology Data Exchange (ETDEWEB)

Aussagues, Ch

1998-12-11

This PhD thesis is dealing with the correct design of safety-critical real-time parallel systems. Such systems constitutes a fundamental part of high-performance systems for command and control that can be found in the nuclear domain or more generally in parallel embedded systems. The verification of their temporal correctness is the core of this thesis. our contribution is mainly in the following three points: the analysis and extension of a programming model for such real-time parallel systems; the proposal of an original method based on a new operator of synchronized product of state machines task-graphs; the validation of the approach by its implementation and evaluation. The work addresses particularly the main problem of optimal task mapping on a parallel architecture, such that the temporal constraints are globally guaranteed, i.e. the timeliness property is valid. The results incorporate also optimally criteria for the sizing and correct dimensioning of a parallel system, for instance in the number of processing elements. These criteria are connected with operational constraints of the application domain. Our approach is based on the off-line analysis of the feasibility of the deadline-driven dynamic scheduling that is used to schedule tasks inside one processor. This leads us to define the synchronized-product, a system of linear, constraints is automatically generated and then allows to calculate a maximum load of a group of tasks and then to verify their timeliness constraints. The communications, their timeliness verification and incorporation to the mapping problem is the second main contribution of this thesis. FInally, the global solving technique dealing with both task and communication aspects has been implemented and evaluated in the framework of the OASIS project in the LETI research center at the CEA/Saclay. (author) 96 refs.
Visual Analysis of North Atlantic Hurricane Trends Using Parallel Coordinates and Statistical Techniques

National Research Council Canada - National Science Library

Steed, Chad A; Fitzpatrick, Patrick J; Jankun-Kelly, T. J; Swan II, J. E

2008-01-01

... for a particular dependent variable. These capabilities are combined into a unique visualization system that is demonstrated via a North Atlantic hurricane climate study using a systematic workflow. This research corroborates the notion that enhanced parallel coordinates coupled with statistical analysis can be used for more effective knowledge discovery and confirmation in complex, real-world data sets.
Computational models of stellar collapse and core-collapse supernovae

Energy Technology Data Exchange (ETDEWEB)

Ott, Christian D; O' Connor, Evan [TAPIR, Mailcode 350-17, California Institute of Technology, Pasadena, CA (United States); Schnetter, Erik; Loeffler, Frank [Center for Computation and Technology, Louisiana State University, Baton Rouge, LA (United States); Burrows, Adam [Department of Astrophysical Sciences, Princeton University, Princeton, NJ (United States); Livne, Eli, E-mail: cott@tapir.caltech.ed [Racah Institute of Physics, Hebrew University, Jerusalem (Israel)

2009-07-01

Core-collapse supernovae are among Nature's most energetic events. They mark the end of massive star evolution and pollute the interstellar medium with the life-enabling ashes of thermonuclear burning. Despite their importance for the evolution of galaxies and life in the universe, the details of the core-collapse supernova explosion mechanism remain in the dark and pose a daunting computational challenge. We outline the multi-dimensional, multi-scale, and multi-physics nature of the core-collapse supernova problem and discuss computational strategies and requirements for its solution. Specifically, we highlight the axisymmetric (2D) radiation-MHD code VULCAN/2D and present results obtained from the first full-2D angle-dependent neutrino radiation-hydrodynamics simulations of the post-core-bounce supernova evolution. We then go on to discuss the new code Zelmani which is based on the open-source HPC Cactus framework and provides a scalable AMR approach for 3D fully general-relativistic modeling of stellar collapse, core-collapse supernovae and black hole formation on current and future massively-parallel HPC systems. We show Zelmani's scaling properties to more than 16,000 compute cores and discuss first 3D general-relativistic core-collapse results.
Two Extreme Climate Events of the Last 1000 Years Recorded in Himalayan and Andean Ice Cores: Impacts on Humans

Science.gov (United States)

Thompson, L. G.; Mosley-Thompson, E. S.; Davis, M. E.; Kenny, D. V.; Lin, P.

2013-12-01

In the last few decades numerous studies have linked pandemic influenza, cholera, malaria, and viral pneumonia, as well as droughts, famines and global crises, to the El Niño-Southern Oscillation (ENSO). Two annually resolved ice core records, one from Dasuopu Glacier in the Himalaya and one from the Quelccaya Ice Cap in the tropical Peruvian Andes provide an opportunity to investigate these relationships on opposite sides of the Pacific Basin for the last 1000 years. The Dasuopu record provides an annual history from 1440 to 1997 CE and a decadally resolved record from 1000 to 1440 CE while the Quelccaya ice core provides annual resolution over the last 1000 years. Major ENSO events are often recorded in the oxygen isotope, insoluble dust, and chemical records from these cores. Here we investigate outbreaks of diseases, famines and global crises during two of the largest events recorded in the chemistry of these cores, particularly large peaks in the concentrations of chloride (Cl-) and fluoride (Fl-). One event is centered on 1789 to 1800 CE and the second begins abruptly in 1345 and tapers off after 1360 CE. These Cl- and F- peaks represent major droughts and reflect the abundance of continental atmospheric dust, derived in part from dried lake beds in drought stricken regions upwind of the core sites. For Dasuopu the likely sources are in India while for Quelccaya the sources would be the Andean Altiplano. Both regions are subject to drought conditions during the El Niño phase of the ENSO cycle. These two events persist longer (10 to 15 years) than today's typical ENSO events in the Pacific Ocean Basin. The 1789 to 1800 CE event was associated with a very strong El Niño event and was coincidental with the Boji Bara famine resulting from extended droughts that led to over 600,000 deaths in central India by 1792. Similarly extensive droughts are documented in Central and South America. Likewise, the 1345 to 1360 CE event, although poorly documented
Parallel computations

CERN Document Server

1982-01-01

Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
The Spatial and Temporal Variability of the North Atlantic Oscillation Recorded in Ice Core Major Ion Time Series

Science.gov (United States)

Wawrzeniak, T. L.; Wake, C. P.; Fischer, H.; Fisher, D. A.; Schwikowski, M.

2006-05-01

The North Atlantic Oscillation represents a significant mode of atmospheric variability for the Arctic and sub- Artic climate system. Developing a longer-term record of the spatial and temporal variability of the NAO could improve our understanding of natural climate variability in the region. Previous work has shown a significant relationship between Greenland ice core records and the NAO. Here, we have compared sea-salt and dust records from nine ice cores around the Arctic region to sea level pressure and NAO indices to evaluate the extent to which these ice cores can be used to reconstruct the NAO.
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

International Nuclear Information System (INIS)

Mburu, Joe Mwangi; Hah, Chang Joo Hah

2014-01-01

Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

Energy Technology Data Exchange (ETDEWEB)

Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

2014-05-15

Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.

Parallel sorting algorithms

CERN Document Server

Akl, Selim G

1985-01-01

Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Parallel halftoning technique using dot diffusion optimization

Science.gov (United States)

Molina-Garcia, Javier; Ponomaryov, Volodymyr I.; Reyes-Reyes, Rogelio; Cruz-Ramos, Clara

2017-05-01

In this paper, a novel approach for halftone images is proposed and implemented for images that are obtained by the Dot Diffusion (DD) method. Designed technique is based on an optimization of the so-called class matrix used in DD algorithm and it consists of generation new versions of class matrix, which has no baron and near-baron in order to minimize inconsistencies during the distribution of the error. Proposed class matrix has different properties and each is designed for two different applications: applications where the inverse-halftoning is necessary, and applications where this method is not required. The proposed method has been implemented in GPU (NVIDIA GeForce GTX 750 Ti), multicore processors (AMD FX(tm)-6300 Six-Core Processor and in Intel core i5-4200U), using CUDA and OpenCV over a PC with linux. Experimental results have shown that novel framework generates a good quality of the halftone images and the inverse halftone images obtained. The simulation results using parallel architectures have demonstrated the efficiency of the novel technique when it is implemented in real-time processing.
A parallel sweeping preconditioner for frequency-domain seismic wave propagation

KAUST Repository

Poulson, Jack

2012-09-01

We present a parallel implementation of Engquist and Ying\\'s sweeping preconditioner, which exploits radiation boundary conditions in order to form an approximate block LDLT factorization of the Helmholtz operator with only O(N4/3) work and an application (and memory) cost of only O(N logN). The approximate factorization is then used as a preconditioner for GMRES, and we show that essentially O(1) iterations are required for convergence, even for the full SEG/EAGE over-thrust model at 30 Hz. In particular, we demonstrate the solution of said problem in a mere 15 minutes on 8192 cores of TACC\\'s Lonestar, which may be the largest-scale 3D heterogeneous Helmholtz calculation to date. Generalizations of our parallel strategy are also briefly discussed for time-harmonic linear elasticity and Maxwell\\'s equations.
Climate of the future: the testimony of the past

International Nuclear Information System (INIS)

Jouzel, J.; Lorius, C.; Raynaud, D.

1994-01-01

Human activities are substantially increasing the atmospheric concentrations of greenhouse gases. Such increase may induce a significant warming over the next decades. Beyond complex predictive climate models, the archives of past climate contain information relevant to this future of our climate. It concerns, in particular, the link between climate and greenhouse gases in the past and the natural variability of the Earth's climate. Both are recorded in polar ice which thus provides records essential for better understanding of the behaviour of the climate system. This is examined from results recently obtained along deep ice cores from Greenland and Antarctica. (authors). 21 refs., 5 figs
10Be evidence for the Matuyama-Brunhes geomagnetic reversal in the EPICA Dome C ice core.

Science.gov (United States)

Raisbeck, G M; Yiou, F; Cattani, O; Jouzel, J

2006-11-02

An ice core drilled at Dome C, Antarctica, is the oldest ice core so far retrieved. On the basis of ice flow modelling and a comparison between the deuterium signal in the ice with climate records from marine sediment cores, the ice at a depth of 3,190 m in the Dome C core is believed to have been deposited around 800,000 years ago, offering a rare opportunity to study climatic and environmental conditions over this time period. However, an independent determination of this age is important because the deuterium profile below a depth of 3,190 m depth does not show the expected correlation with the marine record. Here we present evidence for enhanced 10Be deposition in the ice at 3,160-3,170 m, which we interpret as a result of the low dipole field strength during the Matuyama-Brunhes geomagnetic reversal, which occurred about 780,000 years ago. If correct, this provides a crucial tie point between ice cores, marine cores and a radiometric timescale.
Parallelizing AT with MatlabMPI

International Nuclear Information System (INIS)

2011-01-01

The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while
Reconstructing paleoceanographic conditions in the westernmost Mediterranean during the last 4.000 yr: tracking rapid climate variability

Science.gov (United States)

Nieto-Moreno, V.; Martínez-Ruiz, F.; Jiménez-Espejo, F. J.; Gallego-Torres, D.; Rodrigo-Gámiz, M.; Sakamoto, T.; Böttcher, M.; García-Orellana, J.; Ortega-Huertas, M.

2009-04-01

The westernmost Mediterranean (Alboran Sea basin) is a key location for paleoceanographic and paleoclimatic reconstructions since high sedimentation rates provide ultra high-resolution records at centennial and millennial scales. Here, we present a paleoenvironmental reconstruction for the last 4000 yr, which is based on a multi-proxy approach that includes major and trace element-content fluctuations and mineral composition of marine sediments. The investigated materials correspond to several gravity and box cores recovered in the Alboran Sea basin during different oceanographic cruises (TTR-14 and TTR-17), which have been sampled at very high resolution. Comparative analysis of these cores allows establishing climate oscillations at centennial to millennial scales. Although relatively more attention have been devoted to major climate changes during the last glacial cycle, such as the Last Glacial Maximun, deglaciation and abrupt cooling events (Heinrich and Younger Dryas), the late Holocene has also been punctuated by significant rapid climate variability including polar cooling, aridity and changes in the intensity of the atmospheric circulation. These climate oscillations coincide with significant fluctuations in chemical and mineral composition of marine sediments. Thus, bulk and clay mineralogy, REE composition and Rb/Al, Zr/Al, La/Lu ratios provide information on the sedimentary regime (eolian-fluvial input and source areas), Ba-based proxies on fluctuations in marine productivity and redox sensitive elements on oxygen conditions at time of deposition. A decrease in fluvial-derived elements/minerals (e.g., Rb, detrital mica) takes places during the so-called Late Bronze Age-Iron Age, Dark Age, and Little Ice Age Period. Meanwhile an increase is evidenced during the Medieval Warm Period and the Roman Humid Period. This last trend runs parallel to a decline of element/minerals of typical eolian source (Zr, kaolinite) with the exception of the Roman Humid
Representation of Northern Hemisphere winter storm tracks in climate models

Energy Technology Data Exchange (ETDEWEB)

Greeves, C.Z.; Pope, V.D.; Stratton, R.A.; Martin, G.M. [Met Office Hadley Centre for Climate Prediction and Research, Exeter (United Kingdom)

2007-06-15

Northern Hemisphere winter storm tracks are a key element of the winter weather and climate at mid-latitudes. Before projections of climate change are made for these regions, it is necessary to be sure that climate models are able to reproduce the main features of observed storm tracks. The simulated storm tracks are assessed for a variety of Hadley Centre models and are shown to be well modelled on the whole. The atmosphere-only model with the semi-Lagrangian dynamical core produces generally more realistic storm tracks than the model with the Eulerian dynamical core, provided the horizontal resolution is high enough. The two models respond in different ways to changes in horizontal resolution: the model with the semi-Lagrangian dynamical core has much reduced frequency and strength of cyclonic features at lower resolution due to reduced transient eddy kinetic energy. The model with Eulerian dynamical core displays much smaller changes in frequency and strength of features with changes in horizontal resolution, but the location of the storm tracks as well as secondary development are sensitive to resolution. Coupling the atmosphere-only model (with semi-Lagrangian dynamical core) to an ocean model seems to affect the storm tracks largely via errors in the tropical representation. For instance a cold SST bias in the Pacific and a lack of ENSO variability lead to large changes in the Pacific storm track. Extratropical SST biases appear to have a more localised effect on the storm tracks. (orig.)
Humor Climate of the Primary Schools

Science.gov (United States)

Sahin, Ahmet

2018-01-01

The aim of this study is to determine the opinions primary school administrators and teachers on humor climates in primary schools. The study was modeled as a convergent parallel design, one of the mixed methods. The data gathered from 253 administrator questionnaires, and 651 teacher questionnaires was evaluated for the quantitative part of the…
Solution-phase parallel synthesis of a library of delta(2)-pyrazolines.

Science.gov (United States)

Manyem, Shankar; Sibi, Mukund P; Lushington, Gerald H; Neuenswander, Benjamin; Schoenen, Frank; Aubé, Jeffrey

2007-01-01

A parallel synthesis of a library (80 members) of 2-pyrazolines in solution phase is described. The 2-pyrazoline core was accessed through the [3 + 2] cycloaddition of nitrilimines with enoyl oxazolidinones. The cycloaddition provided two regioisomers, the major product being the C regioisomer. The oxazolidinone moiety was further reduced to the primary alcohol, producing another library of 5-hydroxymethyl-2-pyrazolines. The Lipinski profiles and calculated ADME properties of the compounds are also reported.
Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms

KAUST Repository

Fidel, Adam; Jacobs, Sam Ade; Sharma, Shishir; Amato, Nancy M.; Rauchwerger, Lawrence

2014-01-01

Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores. © 2014 IEEE.
Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms

KAUST Repository

Fidel, Adam

2014-05-01

Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores. © 2014 IEEE.
Sub-core permeability and relative permeability characterization with Positron Emission Tomography

Science.gov (United States)

Zahasky, C.; Benson, S. M.

2017-12-01

This study utilizes preclinical micro-Positron Emission Tomography (PET) to image and quantify the transport behavior of pulses of a conservative aqueous radiotracer injected during single and multiphase flow experiments in a Berea sandstone core with axial parallel bedding heterogeneity. The core is discretized into streamtubes, and using the micro-PET data, expressions are derived from spatial moment analysis for calculating sub-core scale tracer flux and pore water velocity. Using the flux and velocity data, it is then possible to calculate porosity and saturation from volumetric flux balance, and calculate permeability and water relative permeability from Darcy's law. Full 3D simulations are then constructed based on this core characterization. Simulation results are compared with experimental results in order to test the assumptions of the simple streamtube model. Errors and limitations of this analysis will be discussed. These new methods of imaging and sub-core permeability and relative permeability measurements enable experimental quantification of transport behavior across scales.
Ice-sheet flow conditions deduced from mechanical tests of ice core

DEFF Research Database (Denmark)

Miyamoto, Atsushi; Narita, Hideki; Hondoh, Takeo

1999-01-01

Uniaxial compression tests were performed on samples of the Greenland Ice Core Project (GRIP) deep ice core, both in the field and later in a cold-room laboratory, in order to understand the ice-flow behavior of large ice sheets. Experiments were conducted under conditions of constant strain rate....... It was revealed that cloudy bands affect ice-deformation processes, but the details remain unclear. Udgivelsesdato: June......Uniaxial compression tests were performed on samples of the Greenland Ice Core Project (GRIP) deep ice core, both in the field and later in a cold-room laboratory, in order to understand the ice-flow behavior of large ice sheets. Experiments were conducted under conditions of constant strain rate......-core samples with basal planes parallel to the horizontal plane of the ice sheet. The ice-flow enhancement factors show a gradual increase with depth down to approximately 2000 m. These results can be interpreted in terms of an increase in the fourth-order Schmid factor. Below 2000 m depth, the flow...
Parallel MR imaging.

Science.gov (United States)

Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

2012-07-01

Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
The Colorado Plateau Coring Project: A Continuous Cored Non-Marine Record of Early Mesozoic Environmental and Biotic Change

Science.gov (United States)

Irmis, Randall; Olsen, Paul; Geissman, John; Gehrels, George; Kent, Dennis; Mundil, Roland; Rasmussen, Cornelia; Giesler, Dominique; Schaller, Morgan; Kürschner, Wolfram; Parker, William; Buhedma, Hesham

2017-04-01

The early Mesozoic is a critical time in earth history that saw the origin of modern ecosystems set against the back-drop of mass extinction and sudden climate events in a greenhouse world. Non-marine sedimentary strata in western North America preserve a rich archive of low latitude terrestrial ecosystem and environmental change during this time. Unfortunately, frequent lateral facies changes, discontinuous outcrops, and a lack of robust geochronologic constraints make lithostratigraphic and chronostratigraphic correlation difficult, and thus prevent full integration of these paleoenvironmental and paleontologic data into a regional and global context. The Colorado Plateau Coring Project (CPCP) seeks to remedy this situation by recovering a continuous cored record of early Mesozoic sedimentary rocks from the Colorado Plateau of the western United States. CPCP Phase 1 was initiated in 2013, with NSF- and ICDP-funded drilling of Triassic units in Petrified Forest National Park, northern Arizona, U.S.A. This phase recovered a 520 m core (1A) from the northern part of the park, and a 240 m core (2B) from the southern end of the park, comprising the entire Lower-Middle Triassic Moenkopi Formation, and most of the Upper Triassic Chinle Formation. Since the conclusion of drilling, the cores have been CT scanned at the University of Texas - Austin, and split, imaged, and scanned (e.g., XRF, gamma, and magnetic susceptibility) at the University of Minnesota LacCore facility. Subsequently, at the Rutgers University Core Repository, core 1A was comprehensively sampled for paleomagnetism, zircon geochronology, petrography, palynology, and soil carbonate stable isotopes. LA-ICPMS U-Pb zircon analyses are largely complete, and CA-TIMS U-Pb zircon, paleomagnetic, petrographic, and stable isotope analyses are on-going. Initial results reveal numerous horizons with a high proportion of Late Triassic-aged primary volcanic zircons, the age of which appears to be a close
Experimental studies in a single-phase parallel channel natural circulation system. Preliminary results

International Nuclear Information System (INIS)

Bodkha, Kapil; Pilkhwal, D.S.; Jana, S.S.; Vijayan, P.K.

2016-01-01

Natural circulation systems find extensive applications in industrial engineering systems. One of the applications is in nuclear reactor where the decay heat is removed by natural circulation of the fluid under off-normal conditions. The upcoming reactor designs make use of natural circulation in order to remove the heat from core under normal operating conditions also. These reactors employ multiple vertical fuel channels with provision of on-power refueling/defueling. Natural circulation systems are relatively simple, safe and reliable when compared to forced circulation systems. However, natural circulation systems are prone to encounter flow instabilities which are highly undesirable for various reasons. Presence of parallel channels under natural circulation makes the system more complicated. To examine the behavior of parallel channel system, studies were carried out for single-phase natural circulation flow in a multiple vertical channel system. The objective of the present work is to study the flow behavior of the parallel heated channel system under natural circulation for different operating conditions. Steady state and transient studies have been carried out in a parallel channel natural circulation system with three heated channels. The paper brings out the details of the system considered, different cases analyzed and preliminary results of studies carried out on a single-phase parallel channel system.
High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

Science.gov (United States)

von Davier, Matthias

2016-01-01

This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

OpenAIRE

Liu, Xu; Chen, Langshi; Firoz, Jesun S.; Qiu, Judy; Jiang, Lei

2017-01-01

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging per...
Distributed and multi-core computation of 2-loop integrals

International Nuclear Information System (INIS)

De Doncker, E; Yuasa, F

2014-01-01

For an automatic computation of Feynman loop integrals in the physical region we rely on an extrapolation technique where the integrals of the sequence are obtained with iterated/repeated adaptive methods from the QUADPACK 1D quadrature package. The integration rule evaluations in the outer level, corresponding to independent inner integral approximations, are assigned to threads dynamically via the OpenMP runtime in the parallel implementation. Furthermore, multi-level (nested) parallelism enables an efficient utilization of hyperthreading or larger numbers of cores. For a class of loop integrals in the unphysical region, which do not suffer from singularities in the interior of the integration domain, we find that the distributed adaptive integration methods in the multivariate PARINT package are highly efficient and accurate. We apply these techniques without resorting to integral transformations and report on the capabilities of the algorithms and the parallel performance for a test set including various types of two-loop integrals

Using high resolution tritium profiles to quantify the effects of melt on two Spitsbergen ice cores

NARCIS (Netherlands)

van der Wel, L.G.; Streurman, H.J.; Isaksson, E.; Helsen, M.M.; van de Wal, R.S.W.; Martma, T.; Pohjola, V.A.; Moore, J.C.; Meijer, H.A.J.

2011-01-01

Ice cores from small ice caps provide valuable climatic information, additional to that of Greenland and Antarctica. However, their integrity is usually compromised by summer meltwater percolation. To determine to what extent this can affect such ice cores, we performed high-resolution tritium
Using high-resolution tritium profiles to quantify the effects of melt on two Spitsbergen ice cores

NARCIS (Netherlands)

Wel, L.G. van der; Streurman, H.J.; Isaksson, E.; Helsen, M.M.; Wal, R.S.W. van de; Martma, T.; Pohjola, V.A.; Moore, J.C.; Meijer, H.A.J.

2011-01-01

Ice cores from small ice caps provide valuable climatic information, additional to that of Greenland and Antarctica. However, their integrity is usually compromised by summer meltwater percolation. To determine to what extent this can affect such ice cores, we performed high-resolution tritium
Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures

KAUST Repository

Al Farhan, Mohammed Ahmed

2018-04-13

We investigate several state-of-the-practice shared-memory optimization techniques applied to key routines of an unstructured computational aerodynamics application with irregular memory accesses. We illustrate for the Intel KNL processor, as a representative of the processors in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating point numerics of the original code. We employ low and high-level architecture-specific code optimizations involving thread and data-level parallelism. Our approach is based upon a multi-level hierarchical distribution of work and data across both the threads and the SIMD units within every hardware core. On a 64-core KNL chip, we achieve nearly 2.9x speedup of the dominant routines relative to the baseline. These exhibit almost linear strong scalability up to 64 threads, and thereafter some improvement with hyperthreading. At substantially fewer Watts, we achieve up to 1.7x speedup relative to the performance of 72 threads of a 36-core Haswell CPU and roughly equivalent performance to 112 threads of a 56-core Skylake scalable processor. These optimizations are expected to be of value for many other unstructured mesh PDE-based scientific applications as multi and many-core architecture evolves.
Gravity driven emergency core cooling experiments with the PACTEL facility

International Nuclear Information System (INIS)

Munther, R.; Kalli, H.; Kouhia, J.

1996-01-01

PACTEL (Parallel Channel Test Loop) is an experimental out-of-pile facility designed to simulated the major components and system behaviour of a commercial Pressurized Water Reactor (PWR) during different postulated LOCAs and transients. The reference reactor to the PACTEL facility is Loviisa type WWER-440. The recently made modifications enable experiments to be conducted also on the passive core cooling. In these experiments the passive core cooling system consisted of one core makeup tank (CMT) and pressure balancing lines from the pressurizer and from a cold leg connected to the top of the CMT in order to maintain the tank in pressure equilibrium with the primary system during ECC injection. The line from the pressurizer to the core makeup tank was normally open. The ECC flow was provided from the CMT located at a higher elevation than the main part of the primary system. A total number of nine experiments have been performed by now. 4 refs, 7 figs, 3 tabs
Gravity driven emergency core cooling experiments with the PACTEL facility

Energy Technology Data Exchange (ETDEWEB)

Munther, R; Kalli, H [University of Technology, Lappeenranta (Finland); Kouhia, J [Technical Research Centre of Finland, Lappeenranta (Finland)

1996-12-01

PACTEL (Parallel Channel Test Loop) is an experimental out-of-pile facility designed to simulated the major components and system behaviour of a commercial Pressurized Water Reactor (PWR) during different postulated LOCAs and transients. The reference reactor to the PACTEL facility is Loviisa type WWER-440. The recently made modifications enable experiments to be conducted also on the passive core cooling. In these experiments the passive core cooling system consisted of one core makeup tank (CMT) and pressure balancing lines from the pressurizer and from a cold leg connected to the top of the CMT in order to maintain the tank in pressure equilibrium with the primary system during ECC injection. The line from the pressurizer to the core makeup tank was normally open. The ECC flow was provided from the CMT located at a higher elevation than the main part of the primary system. A total number of nine experiments have been performed by now. 4 refs, 7 figs, 3 tabs.
Perceived climate in physical activity settings.

Science.gov (United States)

Gill, Diane L; Morrow, Ronald G; Collins, Karen E; Lucey, Allison B; Schultz, Allison M

2010-01-01

This study focused on the perceived climate for LGBT youth and other minority groups in physical activity settings. A large sample of undergraduates and a selected sample including student teachers/interns and a campus Pride group completed a school climate survey and rated the climate in three physical activity settings (physical education, organized sport, exercise). Overall, school climate survey results paralleled the results with national samples revealing high levels of homophobic remarks and low levels of intervention. Physical activity climate ratings were mid-range, but multivariate analysis of variation test (MANOVA) revealed clear differences with all settings rated more inclusive for racial/ethnic minorities and most exclusive for gays/lesbians and people with disabilities. The results are in line with national surveys and research suggesting sexual orientation and physical characteristics are often the basis for harassment and exclusion in sport and physical activity. The current results also indicate that future physical activity professionals recognize exclusion, suggesting they could benefit from programs that move beyond awareness to skills and strategies for creating more inclusive programs.
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

International Nuclear Information System (INIS)

Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

2011-01-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Laboratory Mid-frequency (Kilohertz) Range Seismic Property Measurements and X-ray CT Imaging of Fractured Sandstone Cores During Supercritical CO2 Injection

Science.gov (United States)

Nakagawa, S.; Kneafsey, T. J.; Chang, C.; Harper, E.

2014-12-01

During geological sequestration of CO2, fractures are expected to play a critical role in controlling the migration of the injected fluid in reservoir rock. To detect the invasion of supercritical (sc-) CO2 and to determine its saturation, velocity and attenuation of seismic waves can be monitored. When both fractures and matrix porosity connected to the fractures are present, wave-induced dynamic poroelastic interactions between these two different types of rock porosity—high-permeability, high-compliance fractures and low-permeability, low-compliance matrix porosity—result in complex velocity and attenuation changes of compressional waves as scCO2 invades the rock. We conducted core-scale laboratory scCO2 injection experiments on small (diameter 1.5 inches, length 3.5-4 inches), medium-porosity/permeability (porosity 15%, matrix permeability 35 md) sandstone cores. During the injection, the compressional and shear (torsion) wave velocities and attenuations of the entire core were determined using our Split Hopkinson Resonant Bar (short-core resonant bar) technique in the frequency range of 1-2 kHz, and the distribution and saturation of the scCO2 determined via X-ray CT imaging using a medical CT scanner. A series of tests were conducted on (1) intact rock cores, (2) a core containing a mated, core-parallel fracture, (3) a core containing a sheared core-parallel fracture, and (4) a core containing a sheared, core-normal fracture. For intact cores and a core containing a mated sheared fracture, injections of scCO2 into an initially water-saturated sample resulted in large and continuous decreases in the compressional velocity as well as temporary increases in the attenuation. For a sheared core-parallel fracture, large attenuation was also observed, but almost no changes in the velocity occurred. In contrast, a sample containing a core-normal fracture exhibited complex behavior of compressional wave attenuation: the attenuation peaked as the leading edge of
Climate and hydrology of the last interglaciation (MIS 5) in Owens Basin, California: Isotopic and geochemical evidence from core OL-92

Science.gov (United States)

Li, H.-C.; Bischoff, J.L.; Ku, T.-L.; Zhu, Z.-Y.

2004-01-01

??18O, ??13C, total organic carbon, total inorganic carbon, and acid-leachable Li, Mg and Sr concentrations on 443 samples from 32 to 83 m depth in Owens Lake core OL-92 were analyzed to study the climatic and hydrological conditions between 60 and 155 ka with a resolution of ???200 a. The multi-proxy data show that Owens Lake overflowed during wet/cold conditions of marine isotope stages (MIS) 4, 5b and 6, and was closed during the dry/warm conditions of MIS 5a, c and e. The lake partially overflowed during MIS 5d. Our age model places the MIS 4/5 boundary at ca 72.5 ka and the MIS 5/6 boundary (Termination II) at ca 140 ka, agreeing with the Devils Hole chronology. The diametrical precipitation intensities between the Great Basin (cold/wet) and eastern China (cold/dry) on Milankovitch time scales imply a climatic teleconnection across the Pacific. It also probably reflects the effect of high-latitude ice sheets on the southward shifts of both the summer monsoon frontal zone in eastern Asia and the polar jet stream in western North America during glacial periods. ?? 2003 Elsevier Ltd. All rights reserved.
A SPECT reconstruction method for extending parallel to non-parallel geometries

International Nuclear Information System (INIS)

Wen Junhai; Liang Zhengrong

2010-01-01

Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
The language parallel Pascal and other aspects of the massively parallel processor

Science.gov (United States)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Parallel Atomistic Simulations

Energy Technology Data Exchange (ETDEWEB)

HEFFELFINGER,GRANT S.

2000-01-18

Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

Directory of Open Access Journals (Sweden)

Shugang Jiang

2015-01-01

Full Text Available It may not be a challenge to run a Finite-Difference Time-Domain (FDTD code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.
Parallel computation of multigroup reactivity coefficient using iterative method

Science.gov (United States)

Susmikanti, Mike; Dewayatna, Winter

2013-09-01

One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

Directory of Open Access Journals (Sweden)

Wang Kai

2011-05-01

Full Text Available Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs have multiple cores, whereas Graphics Processing Units (GPUs also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1 the interaction of SNPs within it in parallel, and 2 the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Ma, Kwan-Liu [Univ. of California, Davis, CA (United States)

2017-02-01

Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations will prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for
Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

2016-05-15

KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.
Tectonic/climatic control on sediment provenance in the Cape Roberts Project core record (southern Victoria Land, Antarctica): A pulsing late Oligocene/early Miocene signal from south revealed by detrital thermochronology

Science.gov (United States)

Olivetti, V.; Balestrieri, M. L.; Rossetti, F.; Talarico, F. M.

2012-04-01

The Mesozoic-Cenozoic West Antarctic Rift System (WARS) is one of the largest intracontinental rift on Earth. The Transantarctic Mountains (TAM) form its western shoulder, marking the boundary between the East and West Antarctica. The rifting evolution is commonly considered polyphase and involves an Early Cretaceous phase linked to the Gondwana break-up followed by a major Cenozoic one, starting at c. 50-40 Ma. This Cenozoic episode corresponds to the major uplift/denudation phase of the TAM, which occurred concurrently with transition from orthogonal to oblique rifting. The Cenozoic rift reorganization occurred concurrently with a major change in the global climate system and a global reorganization of plate motions. This area thus provide an outstanding natural laboratory for studying a range of geological problems that involve feedback relationships between tectonics and climate. A key to address the tectonic/climate feedback relations is to look on apparent synchronicity in erosion signal between different segments, and to compare these with well-dated regional and global climatic events. However, due to the paucity of Cenozoic rock sequences exposed along the TAM front, a few information is available about the neotectonics of the rift and rift-flank uplift system. The direct physical record of the tectonic/climate history of the WARS recovered by core drillings along the western margin of the Ross sea (DSDP, CIROS, Cape Roberts and ANDRILL projects) provides an invaluable tool to address this issue. Twenty-three samples distributed throughout the entire composite drill-cored stratigraphic succession of Cape Roberts were analyzed. Age probability plots of eighteen detrital samples with depositional ages between 34 Ma and the Pliocene were decomposed into statistically significant age populations or peaks using binomial peak-fitting. Moreover, three granitic pebbles, one dolerite clast and one sample of Beacon sandstones have been dated. From detrital samples
Linux block IO: introducing multi-queue SSD access on multi-core systems

DEFF Research Database (Denmark)

Bjørling, Matias; Axboe, Jens; Nellans, David

2013-01-01

The IO performance of storage devices has accelerated from hundreds of IOPS five years ago, to hundreds of thousands of IOPS today, and tens of millions of IOPS projected in five years. This sharp evolution is primarily due to the introduc- tion of NAND-flash devices and their data parallel desig...... generation block layer that is capable of handling tens of millions of IOPS on a multi-core system equipped with a single storage device. Our experiments show that our design scales graciously with the number of cores, even on NUMA systems with multiple sockets....
Multi-Core Emptiness Checking of Timed Büchi Automata using Inclusion Abstraction

DEFF Research Database (Denmark)

Laarman, Alfons; Olesen, Mads Chr.; Dalsgaard, Andreas

2013-01-01

This paper contributes to the multi-core model checking of timed automata (TA) with respect to liveness properties, by investigating checking of TA Büchi emptiness under the very coarse inclusion abstraction or zone subsumption, an open problem in this field. We show that in general Büchi emptiness...... parallel LTL model checking algorithm for timed automata. The algorithms are implemented in LTSmin, and experimental evaluations show the effectiveness and scalability of both contributions: subsumption halves the number of states in the real-world FDDI case study, and the multi-core algorithm yields...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.