SWE
Public Member Functions
SWE_WavePropagationBlockCuda Class Reference

#include <SWE_WavePropagationBlockCuda.hh>

Inheritance diagram for SWE_WavePropagationBlockCuda:
SWE_BlockCUDA SWE_Block

List of all members.

Public Member Functions

 SWE_WavePropagationBlockCuda (int l_nx, int l_ny, float l_dx, float l_dy)
 ~SWE_WavePropagationBlockCuda ()
void simulateTimestep (float i_dT)
float simulate (float, float)
void computeNumericalFluxes ()
void updateUnknowns (const float i_deltaT)

Detailed Description

SWE_WavePropagationBlockCuda is an implementation of the SWE_BlockCuda abstract class. It uses a wave propagation solver which is defined with the pre-compiler flag WAVE_PROPAGATION_SOLVER (see above).

Possible wave propagation solvers are: F-Wave, <strike>Approximate Augmented Riemann, Hybrid (f-wave + augmented).</strike> (details can be found in the corresponding source files)


Constructor & Destructor Documentation

SWE_WavePropagationBlockCuda::SWE_WavePropagationBlockCuda ( int  l_nx,
int  l_ny,
float  l_dx,
float  l_dy 
)

Constructor of a SWE_WavePropagationBlockCuda.

Allocates the variables for the simulation: Please note: The definition of indices changed in contrast to the CPU-Implementation.

unknowns hd,hud,hvd,bd stored on the CUDA device are defined for grid indices [0,..,nx+1]*[0,..,ny+1] (-> Abstract class SWE_BlockCUDA) -> computational domain is [1,..,nx]*[1,..,ny] -> plus ghost cell layer

net-updates are defined for edges with indices [0,..,nx]*[0,..,ny] for horizontal and vertical edges for simplicity (one layer is not necessary).

A left/right net update with index (i-1,j) is located on the edge between cells with index (i-1,j) and (i,j):

   *********************
   *         *         *
   * (i-1,j) *  (i,j)  *
   *         *         *
   *********************
             *
            ***
           *****
             *
             *
   NetUpdatesLeft(i-1,j)
             or
   NetUpdatesRight(i-1,j)
 

A below/above net update with index (i, j-1) is located on the edge between cells with index (i, j-1) and (i,j):

   ***********
   *         *
   * (i, j)  *   *
   *         *  **  NetUpdatesBelow(i,j-1)
   *********** *****         or
   *         *  **  NetUpdatesAbove(i,j-1)
   * (i,j-1) *   *
   *         *
   ***********
 
Parameters:
i_offsetXspatial offset of the block in x-direction.
i_offsetYspatial offset of the offset in y-direction.
i_cudaDeviceID of the CUDA-device, which should be used.

Destructor of a SWE_WavePropagationBlockCuda.

Frees all of the memory, which was allocated within the constructor. Resets the CUDA device: Useful if error occured and printf is used on the device (buffer).


Member Function Documentation

Compute the numerical fluxes (net-update formulation here) on all edges.

The maximum wave speed is computed within the net-updates kernel for each CUDA-block. To finalize the method the Thrust-library is called, which does the reduction over all blockwise maxima. In the wave speed reduction step the actual cell width in x- and y-direction is not taken into account.

TODO: A splitting or direct computation of the time step width might increase the total time step size. Example: dx = 11, dy = 6; max wave speed in x-direction: 10 max wave speed in y-direction: 5.5 max wave speed in both direction: 10

=> maximum time step (current implementation): min(11/10, 6/10) = 0.6 => maximum time step (splitting the dimensions): min(11/10, 6/5.5) = 1.09..

Row-major vs column-major

C/C++ arrays are row-major whereas warps are constructed in column-major order from threads/blocks. To get coalesced memory access in CUDA, we can use a 2-dimensional CUDA structure but we have to switch x and y inside a block.

This means, we have to switch threadIdx.x <-> threadIdx.y as well as blockDim.x <-> blockDim.y. Important: blockDim has to be switched for the kernel call as well!

definition of one CUDA-block. Typical size are 8*8 or 16*16

Definition of the "main" CUDA-grid. This grid covers only edges 0..#(edges in x-direction)-2 and 0..#(edges in y-direction)-2.

An example with a computational domain of size nx = 24, ny = 16 with a 1 cell ghost layer would result in a grid with (nx+2)*(ny+2) = (26*18) cells and (nx+1)*(ny+1) = (25*17) edges.

The CUDA-blocks (here 8*8) mentioned above would cover all edges except the ones lying between the computational domain and the right/top ghost layer:

                                                          *
                                                         **        top ghost layer,
                                                        ********   cell ids
                        *******************************  **        = (*, ny+1)
                        *         *         *         *   *
                        *   8*8   *   8*8   *   8*8   *
                        *  block  *  block  *  block  *
                        *         *         *         *
                        *******************************
                        *         *         *         *
                        *   8*8   *   8*8   *   8*8   *
                    *   *  block  *  block  *  block  *
     bottom         **  *         *         *         *
     ghost     ******** *******************************
     layer,         **
     cell ids       *   *                              *
     =(*,0)            ***                            ***
                        *                              *
                        *                              *
                  left ghost layer,             right ghost layer,
                  cell ids = (0,*)             cell ids = (nx+1, *)
 

Implements SWE_Block.

__host__ float SWE_WavePropagationBlockCuda::simulate ( float  tStart,
float  tEnd 
) [virtual]

perform forward-Euler time steps, starting with simulation time tStart,: until simulation time tEnd is reached; device-global variables hd, hud, hvd are updated; unknowns h, hu, hv in main memory are not updated. Ghost layers and bathymetry sources are updated between timesteps. intended as main simulation loop between two checkpoints

Implements SWE_Block.

__host__ void SWE_WavePropagationBlockCuda::simulateTimestep ( float  i_dT) [virtual]

Compute a single global time step of a given time step width. Remark: The user has to take care about the time step width. No additional check is done. The time step width typically available after the computation of the numerical fluxes (hidden in this method).

First the net-updates are computed. Then the cells are updated with the net-updates and the given time step width.

Parameters:
i_dTtime step width in seconds.

Implements SWE_Block.

void SWE_WavePropagationBlockCuda::updateUnknowns ( const float  i_deltaT) [virtual]

Update the cells with a given global time step.

Parameters:
i_deltaTtime step size.

definition of one CUDA-block. Typical size are 8*8 or 16*16

definition of the CUDA-grid.

Implements SWE_Block.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends