SWE
Functions
/import/home/rettenbs/src/SWE/src/blocks/cuda/SWE_WavePropagationBlockCuda_kernels.hh File Reference

Go to the source code of this file.

Functions

__global__ void computeNetUpdatesKernel (const float *i_h, const float *i_hu, const float *i_hv, const float *i_b, float *o_hNetUpdatesLeftD, float *o_hNetUpdatesRightD, float *o_huNetUpdatesLeftD, float *o_huNetUpdatesRightD, float *o_hNetUpdatesBelowD, float *o_hNetUpdatesAboveD, float *o_hvNetUpdatesBelowD, float *o_hvNetUpdatesAboveD, float *o_maximumWaveSpeeds, const int i_nx, const int i_ny, const int i_offsetX=0, const int i_offsetY=0, const int i_blockOffSetX=0, const int i_blockOffSetY=0)
__global__ void updateUnknownsKernel (const float *i_hNetUpdatesLeftD, const float *i_hNetUpdatesRightD, const float *i_huNetUpdatesLeftD, const float *i_huNetUpdatesRightD, const float *i_hNetUpdatesBelowD, const float *i_hNetUpdatesAboveD, const float *i_hvNetUpdatesBelowD, const float *i_hvNetUpdatesAboveD, float *io_h, float *io_hu, float *io_hv, const float i_updateWidthX, const float i_updateWidthY, const int i_nx, const int i_ny)
__device__ int computeOneDPositionKernel (const int i_i, const int i_j, const int i_nx)

Detailed Description

This file is part of SWE.

Author:
Alexander Breuer (breuera AT in.tum.de, http://www5.in.tum.de/wiki/index.php/Dipl.-Math._Alexander_Breuer)

LICENSE

SWE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

SWE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with SWE. If not, see <http://www.gnu.org/licenses/>.

DESCRIPTION

CUDA Kernels for a SWE_Block, which uses solvers in the wave propagation formulation.


Function Documentation

__global__ void computeNetUpdatesKernel ( const float *  i_h,
const float *  i_hu,
const float *  i_hv,
const float *  i_b,
float *  o_hNetUpdatesLeftD,
float *  o_hNetUpdatesRightD,
float *  o_huNetUpdatesLeftD,
float *  o_huNetUpdatesRightD,
float *  o_hNetUpdatesBelowD,
float *  o_hNetUpdatesAboveD,
float *  o_hvNetUpdatesBelowD,
float *  o_hvNetUpdatesAboveD,
float *  o_maximumWaveSpeeds,
const int  i_nX,
const int  i_nY,
const int  i_offsetX,
const int  i_offsetY,
const int  i_blockOffSetX,
const int  i_blockOffSetY 
)

The compute net-updates kernel calls the solver for a defined CUDA-Block and does a reduction over the computed wave speeds within this block.

Remark: In overall we have nx+1 / ny+1 edges. Therefore the edges "simulation domain"/"top ghost layer" and "simulation domain"/"right ghost layer" will not be computed in a typical call of the function: computeNetUpdatesKernel<<<dimGrid,dimBlock>>>( hd, hud, hvd, bd, hNetUpdatesLeftD, hNetUpdatesRightD, huNetUpdatesLeftD, huNetUpdatesRightD, hNetUpdatesBelowD, hNetUpdatesAboveD, hvNetUpdatesBelowD, hvNetUpdatesAboveD, l_maximumWaveSpeedsD, i_nx, i_ny ); To reduce the effect of branch-mispredictions the kernel provides optional offsets, which can be used to compute the missing edges.

SWE_WavePropagationBlockCuda::computeNumericalFluxes() explains the coalesced memory access.

Parameters:
i_hwater heights (CUDA-array).
i_humomentums in x-direction (CUDA-array).
i_hvmomentums in y-direction (CUDA-array).
i_bbathymetry values (CUDA-array).
o_hNetUpdatesLeftDleft going net-updates for the water height (CUDA-array).
o_hNetUpdatesRightDright going net-updates for the water height (CUDA-array).
o_huNetUpdatesLeftDleft going net-updates for the momentum in x-direction (CUDA-array).
o_huNetUpdatesRightDright going net-updates for the momentum in x-direction (CUDA-array).
o_hNetUpdatesBelowDdownwards going net-updates for the water height (CUDA-array).
o_hNetUpdatesAboveDupwards going net-updates for the water height (CUDA-array).
o_hvNetUpdatesBelowDdownwards going net-updates for the momentum in y-direction (CUDA-array).
o_hvNetUpdatesAboveDupwards going net-updates for the momentum in y-direction (CUDA-array).
o_maximumWaveSpeedsmaximum wave speed which occurred within the CUDA-block is written here (CUDA-array).
i_nxnumber of cells within the simulation domain in x-direction (excludes ghost layers).
i_nynumber of cells within the simulation domain in y-direction (excludes ghost layers).
i_offsetXcell/edge offset in x-direction.
i_offsetYcell/edge offset in y-direction.

array maximum wave speed within this CUDA-block

thread local index in the shared maximum wave speed array

index (l_cellIndexI,l_cellIndexJ) of the cell lying on the right side of the edge/above the edge where the thread works on.

array which holds the thread local net-updates.

location of the thread local cells in the global CUDA-arrays.

reduction partner for a thread

Position of the maximum wave speed in the global device array.

In the 'main' part (e.g. gridDim = nx/TILE_SIZEm ny/TILE_SIZE) the position is simply given by the blockId in x- and y-direction with a stride of gridDim.x + 1. The +1 results from the speeds in the 'boundary' case, see below.

In the 'boundary' case, where the edges lie between the computational domain and the right/top ghost layer, this is more complicated. In this case block offsets in x- and y-direction are used. The offsets define how many blocks in the resp. direction have to be added to get a valid result. Computational domain - right ghost layer: In this case the dimension of the grid in x-direction is 1. Computational domain - top ghost layer: In this case the dimension of the grid in y-direction is 1.

Same Example as in SWE_WavePropagationBlockCuda::computeNumericalFluxes(), assume the CUDA-grid/-blocks has the following layout:

                                                          *
                                                         **        top ghost layer,
                        * block 8 * block 9 * block 10* ********   cell ids
                        ******************************** **
                        *         *         *         *   *
                        *  block  *  block  *  block  * b
                        *    4    *    5    *    6    * 7
                        *         *         *         *
                        ********************************
                        *         *         *         *
                        *  block  *  block  *  block  * b
                    *   *    0    *    1    *    2    * 3
     bottom         **  *         *         *         *
     ghost     ******** ********************************
     layer          **
                    *   *                              *
                       ***                            ***
                        *                              *
                        *                              *
                  left ghost layer              right ghost layer
 

This results in a 'main' part containing of (3*2) blocks and two 'boundary' parts containing of (1*2) blocks and (3*1) blocks.

The maximum wave speed array on the device represents therefore logically a (4 * 3)-1 2D-array (-1: no block on the top right). The 'main' part writes into cells 0, 1, 2, 4, 5 and 6. The 'computational domain - right ghost layer' part writes into 3 and 7 with offset in x-direction = 3 The 'computational domain - top ghost layer' part writes into 8, 9, 10 with offset in y-direction = 2

__device__ int computeOneDPositionKernel ( const int  i_i,
const int  i_j,
const int  i_ny 
) [inline]

Compute the position of 2D coordinates in a 1D array. array[i][j] -> i * ny + j

Parameters:
i_irow index.
i_jcolumn index.
i_ny#(cells in y-direction).
Returns:
1D index.
__global__ void updateUnknownsKernel ( const float *  i_hNetUpdatesLeftD,
const float *  i_hNetUpdatesRightD,
const float *  i_huNetUpdatesLeftD,
const float *  i_huNetUpdatesRightD,
const float *  i_hNetUpdatesBelowD,
const float *  i_hNetUpdatesAboveD,
const float *  i_hvNetUpdatesBelowD,
const float *  i_hvNetUpdatesAboveD,
float *  io_h,
float *  io_hu,
float *  io_hv,
const float  i_updateWidthX,
const float  i_updateWidthY,
const int  i_nX,
const int  i_nY 
)

The "update unknowns"-kernel updates the unknowns in the cells with precomputed net-updates.

SWE_WavePropagationBlockCuda::computeNumericalFluxes() explains the coalesced memory access.

Parameters:
i_hNetUpdatesLeftDleft going net-updates for the water height (CUDA-array).
i_hNetUpdatesRightDright going net-updates for the water height (CUDA-array).
i_huNetUpdatesLeftDleft going net-updates for the momentum in x-direction (CUDA-array).
i_huNetUpdatesRightDright going net-updates for the momentum in x-direction (CUDA-array).
i_hNetUpdatesBelowDdownwards going net-updates for the water height (CUDA-array).
i_hNetUpdatesAboveDupwards going net-updates for the water height (CUDA-array).
i_hvNetUpdatesBelowDdownwards going net-updates for the momentum in y-direction (CUDA-array).
i_hvNetUpdatesAboveDupwards going net-updates for the momentum in y-direction (CUDA-array).
io_hwater heights (CUDA-array).
io_humomentums in x-direction (CUDA-array).
io_hvmomentums in y-direction (CUDA-array).
i_updateWidthXupdate width in x-direction.
i_updateWidthYupdate width in y-direction.
i_nxnumber of cells within the simulation domain in x-direction (excludes ghost layers).
i_nynumber of cells within the simulation domain in y-direction (excludes ghost layers).

cell indices (l_cellIndexI,l_cellIndexJ) of the cell which the thread updates.

location of the thread local cell in the global CUDA-arrays.

positions of the net-updates in the global CUDA-arrays.

Compute the positions of the net updates relative to a given cell

                  netUpdateRight(i-1, j)
                           |           |
                           |           |
                           |           |
  netUpdateBelow(i,j) -----*************-----
                           *           *
                           * cell(i,j) *
                           *           *
                      -----*************------ netUpdateAbove(i, j-1)
                           |           |
                           |           |
                           |           |
                                netUpdatesLeft(i,j)
 
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends