SWE

#include <SWE_WavePropagationBlockCuda.hh>
Public Member Functions  
SWE_WavePropagationBlockCuda (int l_nx, int l_ny, float l_dx, float l_dy)  
~SWE_WavePropagationBlockCuda ()  
void  simulateTimestep (float i_dT) 
float  simulate (float, float) 
void  computeNumericalFluxes () 
void  updateUnknowns (const float i_deltaT) 
SWE_WavePropagationBlockCuda is an implementation of the SWE_BlockCuda abstract class. It uses a wave propagation solver which is defined with the precompiler flag WAVE_PROPAGATION_SOLVER (see above).
Possible wave propagation solvers are: FWave, <strike>Approximate Augmented Riemann, Hybrid (fwave + augmented).</strike> (details can be found in the corresponding source files)
SWE_WavePropagationBlockCuda::SWE_WavePropagationBlockCuda  (  int  l_nx, 
int  l_ny,  
float  l_dx,  
float  l_dy  
) 
Constructor of a SWE_WavePropagationBlockCuda.
Allocates the variables for the simulation: Please note: The definition of indices changed in contrast to the CPUImplementation.
unknowns hd,hud,hvd,bd stored on the CUDA device are defined for grid indices [0,..,nx+1]*[0,..,ny+1] (> Abstract class SWE_BlockCUDA) > computational domain is [1,..,nx]*[1,..,ny] > plus ghost cell layer
netupdates are defined for edges with indices [0,..,nx]*[0,..,ny] for horizontal and vertical edges for simplicity (one layer is not necessary).
A left/right net update with index (i1,j) is located on the edge between cells with index (i1,j) and (i,j):
********************* * * * * (i1,j) * (i,j) * * * * *********************
* *** ***** * * NetUpdatesLeft(i1,j) or NetUpdatesRight(i1,j)
A below/above net update with index (i, j1) is located on the edge between cells with index (i, j1) and (i,j):
*********** * * * (i, j) * * * * ** NetUpdatesBelow(i,j1) *********** ***** or * * ** NetUpdatesAbove(i,j1) * (i,j1) * * * * ***********
i_offsetX  spatial offset of the block in xdirection. 
i_offsetY  spatial offset of the offset in ydirection. 
i_cudaDevice  ID of the CUDAdevice, which should be used. 
Destructor of a SWE_WavePropagationBlockCuda.
Frees all of the memory, which was allocated within the constructor. Resets the CUDA device: Useful if error occured and printf is used on the device (buffer).
void SWE_WavePropagationBlockCuda::computeNumericalFluxes  (  )  [virtual] 
Compute the numerical fluxes (netupdate formulation here) on all edges.
The maximum wave speed is computed within the netupdates kernel for each CUDAblock. To finalize the method the Thrustlibrary is called, which does the reduction over all blockwise maxima. In the wave speed reduction step the actual cell width in x and ydirection is not taken into account.
TODO: A splitting or direct computation of the time step width might increase the total time step size. Example: dx = 11, dy = 6; max wave speed in xdirection: 10 max wave speed in ydirection: 5.5 max wave speed in both direction: 10
=> maximum time step (current implementation): min(11/10, 6/10) = 0.6 => maximum time step (splitting the dimensions): min(11/10, 6/5.5) = 1.09..
Rowmajor vs columnmajor
C/C++ arrays are rowmajor whereas warps are constructed in columnmajor order from threads/blocks. To get coalesced memory access in CUDA, we can use a 2dimensional CUDA structure but we have to switch x and y inside a block.
This means, we have to switch threadIdx.x <> threadIdx.y as well as blockDim.x <> blockDim.y. Important: blockDim has to be switched for the kernel call as well!
definition of one CUDAblock. Typical size are 8*8 or 16*16
Definition of the "main" CUDAgrid. This grid covers only edges 0..#(edges in xdirection)2 and 0..#(edges in ydirection)2.
An example with a computational domain of size nx = 24, ny = 16 with a 1 cell ghost layer would result in a grid with (nx+2)*(ny+2) = (26*18) cells and (nx+1)*(ny+1) = (25*17) edges.
The CUDAblocks (here 8*8) mentioned above would cover all edges except the ones lying between the computational domain and the right/top ghost layer:
* ** top ghost layer, ******** cell ids ******************************* ** = (*, ny+1) * * * * * * 8*8 * 8*8 * 8*8 * * block * block * block * * * * * ******************************* * * * * * 8*8 * 8*8 * 8*8 * * * block * block * block * bottom ** * * * * ghost ******** ******************************* layer, ** cell ids * * * =(*,0) *** *** * * * * left ghost layer, right ghost layer, cell ids = (0,*) cell ids = (nx+1, *)
Implements SWE_Block.
__host__ float SWE_WavePropagationBlockCuda::simulate  (  float  tStart, 
float  tEnd  
)  [virtual] 
perform forwardEuler time steps, starting with simulation time tStart,: until simulation time tEnd is reached; deviceglobal variables hd, hud, hvd are updated; unknowns h, hu, hv in main memory are not updated. Ghost layers and bathymetry sources are updated between timesteps. intended as main simulation loop between two checkpoints
Implements SWE_Block.
__host__ void SWE_WavePropagationBlockCuda::simulateTimestep  (  float  i_dT  )  [virtual] 
Compute a single global time step of a given time step width. Remark: The user has to take care about the time step width. No additional check is done. The time step width typically available after the computation of the numerical fluxes (hidden in this method).
First the netupdates are computed. Then the cells are updated with the netupdates and the given time step width.
i_dT  time step width in seconds. 
Implements SWE_Block.
void SWE_WavePropagationBlockCuda::updateUnknowns  (  const float  i_deltaT  )  [virtual] 
Update the cells with a given global time step.
i_deltaT  time step size. 
definition of one CUDAblock. Typical size are 8*8 or 16*16
definition of the CUDAgrid.
Implements SWE_Block.