ls1-MarDyn
ls1-MarDyn molecular dynamics code
Public Member Functions | Friends | List of all members
KDDecomposition Class Reference

KD tree based domain decomposition for better load balancing. More...

#include <KDDecomposition.h>

Inheritance diagram for KDDecomposition:
DomainDecompMPIBase DomainDecompBase MemoryProfilable

Public Member Functions

 KDDecomposition (double cutoffRadius, int numParticleTypes, int updateFrequency=100, int fullSearchThreshold=2)
 create an initial decomposition tree More...
 
void init (Domain *domain)
 
virtual void readXML (XMLfileUnits &xmlconfig) override
 Read in XML configuration for KDDecomposition and all its included objects. More...
 
virtual void prepareNonBlockingStage (bool forceRebalancing, ParticleContainer *moleculeContainer, Domain *domain, unsigned int stageNumber) override
 
virtual void finishNonBlockingStage (bool forceRebalancing, ParticleContainer *moleculeContainer, Domain *domain, unsigned int stageNumber) override
 
bool queryBalanceAndExchangeNonBlocking (bool forceRebalancing, ParticleContainer *moleculeContainer, Domain *domain, double etime) override
 Checks whether the balance and exchange step can be performed non-blocking. More...
 
void balanceAndExchange (double lastTraversalTime, bool forceRebalancing, ParticleContainer *moleculeContainer, Domain *domain) override
 balance the load (and optimize communication) and exchange boundary particles More...
 
double getBoundingBoxMin (int dimension, Domain *domain) override
 
double getBoundingBoxMax (int dimension, Domain *domain) override
 
int getUpdateFrequency () const
 
void setUpdateFrequency (int frequency)
 
std::vector< int > getNeighbourRanks () override
 
std::vector< int > getNeighbourRanksFullShell () override
 
std::vector< CommunicationPartnergetNeighboursFromHaloRegion (Domain *domain, const HaloRegion &haloRegion, double cutoff) override
 
void fillTimeVecs (CellProcessor **cellProc)
 
void printTree (std::ostream &ostream)
 
- Public Member Functions inherited from DomainDecompMPIBase
void barrier () const override
 synchronizes all processes More...
 
unsigned Ndistribution (unsigned localN, float *minrnd, float *maxrnd)
 returns total number of molecules More...
 
void assertIntIdentity (int IX)
 checks identity of random number generators More...
 
void assertDisjunctivity (ParticleContainer *moleculeContainer) const override
 
void collCommInit (int numValues, int key=0) override
 has to call init method of a CollComm class More...
 
void collCommFinalize () override
 has to call finalize method of a CollComm class More...
 
void collCommAppendInt (int intValue) override
 has to call appendInt method of a CollComm class More...
 
void collCommAppendUnsLong (unsigned long unsLongValue) override
 has to call appendUnsLong method of a CollComm class More...
 
void collCommAppendFloat (float floatValue) override
 has to call appendFloat method of a CollComm class More...
 
void collCommAppendDouble (double doubleValue) override
 has to call appendDouble method of a CollComm class More...
 
void collCommAppendLongDouble (long double longDoubleValue) override
 has to call appendLongDouble method of a CollComm class More...
 
int collCommGetInt () override
 has to call getInt method of a CollComm class More...
 
unsigned long collCommGetUnsLong () override
 has to call getUnsLong method of a CollComm class More...
 
float collCommGetFloat () override
 has to call getFloat method of a CollComm class More...
 
double collCommGetDouble () override
 has to call getDouble method of a CollComm class More...
 
long double collCommGetLongDouble () override
 has to call getLongDouble method of a CollComm class More...
 
void collCommAllreduceSum () override
 has to call allreduceSum method of a CollComm class (none in sequential version) More...
 
void collCommAllreduceSumAllowPrevious () override
 has to call allreduceSum method of a CollComm class (none in sequential version), allows for values of previous iteration. More...
 
void collCommAllreduceCustom (ReduceType type) override
 has to call allreduceCustom method of a CollComm class (none in sequential version) More...
 
void collCommScanSum () override
 has to call scanSum method of a CollComm class (none in sequential version) More...
 
void collCommBroadcast (int root=0) override
 has to call broadcast method of a CollComm class (none in sequential version) More...
 
virtual void balanceAndExchangeInitNonBlocking (bool forceRebalancing, ParticleContainer *moleculeContainer, Domain *domain)
 
void exchangeMoleculesMPI (ParticleContainer *moleculeContainer, Domain *domain, MessageType msgType, bool doHaloPositionCheck=true, bool removeRecvDuplicates=false)
 exchange molecules between processes More...
 
void exchangeForces (ParticleContainer *moleculeContainer, Domain *domain) override
 Exchanges forces at the domain boundaries if it's required by the cell container. More...
 
MPI_Datatype getMPIParticleType ()
 
MPI_Datatype getMPIParticleForceType ()
 
MPI_Comm getCommunicator () override
 
virtual void setCommunicationScheme (const std::string &scheme, const std::string &comScheme)
 
virtual int getNonBlockingStageCount () override
 
virtual size_t getTotalSize () override
 
virtual void printSubInfo (int offset) override
 
virtual void printDecomp (const std::string &filename, Domain *domain, ParticleContainer *particleContainer) override
 writes information about the current decomposition into the given file More...
 
virtual std::string getName () override
 
void printCommunicationPartners (std::string filename) const override
 
- Public Member Functions inherited from DomainDecompBase
 DomainDecompBase ()
 The Constructor determines the own rank and the number of the neighbours *‍/.
 
 ~DomainDecompBase () override
 The Destructor finalizes MPI.
 
void exchangeMolecules (ParticleContainer *moleculeContainer, Domain *domain)
 exchange molecules between processes More...
 
virtual bool procOwnsPos (double x, double y, double z, Domain *domain) final
 find out whether the given position belongs to the domain of this process More...
 
void getBoundingBoxMinMax (Domain *domain, double *min, double *max)
 get the minimum and maximum coordinate of the bounding box of this process' domain More...
 
virtual int getRank () const
 returns the own rank More...
 
virtual int getNumProcs () const
 returns the number of processes More...
 
virtual double getTime () const
 returns the time in seconds since some time in the past
 
double getIOCutoffRadius (int dim, Domain *domain, ParticleContainer *moleculeContainer)
 returns an cutoff radius for a dimension for a global linked cells datastructure More...
 
void writeMoleculesToMPIFileBinary (const std::string &filename, ParticleContainer *moleculeContainer) const
 appends molecule data to the file. The format is the same as that of the input file This version uses, MPI IO. More...
 
void writeMoleculesToFile (const std::string &filename, ParticleContainer *moleculeContainer, bool binary=false) const
 appends molecule data to the file. The format is the same as that of the input file If MPI is enabled and binary files are supposed to be written this function will call writeMoleculesToMPIFileBinary(). Otherwise, to ensure that not more than one process writes to the file at any time, there is a loop over all processes with a barrier in between. More...
 
void updateSendLeavingWithCopies (bool sendTogether)
 
bool sendLeavingWithCopies () const
 
virtual std::vector< std::vector< std::vector< int > > > getAllRanks ()
 

Friends

class KDDecompositionTest
 

Additional Inherited Members

- Protected Member Functions inherited from DomainDecompMPIBase
virtual void prepareNonBlockingStageImpl (ParticleContainer *moleculeContainer, Domain *domain, unsigned int stageNumber, MessageType msgType, bool removeRecvDuplicates=false)
 
virtual void finishNonBlockingStageImpl (ParticleContainer *moleculeContainer, Domain *domain, unsigned int stageNumber, MessageType msgType, bool removeRecvDuplicates=false)
 
- Protected Member Functions inherited from DomainDecompBase
void addLeavingMolecules (std::vector< Molecule > &&invalidMolecules, ParticleContainer *moleculeContainer)
 
void handleDomainLeavingParticles (unsigned dim, ParticleContainer *moleculeContainer) const
 
void handleDomainLeavingParticlesDirect (const HaloRegion &haloRegion, ParticleContainer *moleculeContainer, std::vector< Molecule > &invalidParticles) const
 
void handleForceExchange (unsigned dim, ParticleContainer *moleculeContainer) const
 Does the force exchange for each dimension. Will be called for dim=0, 1 and 2. More...
 
virtual void handleForceExchangeDirect (const HaloRegion &haloRegion, ParticleContainer *moleculeContainer) const
 
void populateHaloLayerWithCopies (unsigned dim, ParticleContainer *moleculeContainer) const
 
void populateHaloLayerWithCopiesDirect (const HaloRegion &haloRegion, ParticleContainer *moleculeContainer, bool positionCheck=true) const
 
- Protected Attributes inherited from DomainDecompMPIBase
MPI_Datatype _mpiParticleType
 
MPI_Datatype _mpiParticleForceType
 
MPI_Comm _comm
 
std::unique_ptr< NeighbourCommunicationScheme_neighbourCommunicationScheme
 
bool _forceDirectPP {false}
 
- Protected Attributes inherited from DomainDecompBase
int _rank
 the id of the current process
 
int _numProcs
 total number of processes in the simulation
 

Detailed Description

KD tree based domain decomposition for better load balancing.

The basic idea is to build up all possible subdivisions and do a A*-like search of the best subdivision.

The function to minimize is the load imbalance:

\[\sum_{i \in \text{children}} (\text{load}_i - \text{optimalLoad})^2\]

During the downward pass an estimate is the expected load imbalance, which is averaged over all children, in the upward pass, the exact imbalance is calculated.

Note
It is important for the A*-search that the estimate is an underestimation (<=) of the load imbalance.
Some computation of the deviation / expected deviation is done in KDNode.

Constructor & Destructor Documentation

◆ KDDecomposition()

KDDecomposition::KDDecomposition ( double  cutoffRadius,
int  numParticleTypes,
int  updateFrequency = 100,
int  fullSearchThreshold = 2 
)

create an initial decomposition tree

The constructor determines the number of cells and creates an initial decomposition of the domain (not yet balanced), which is stored in _decompTree and _ownArea.

Parameters
cutoffRadiuslargest cutoff radius of a molecule (determines a basic cell for loadbalancing)
domain
updateFrequencyevery n-th timestep, load will be balanced.
fullSearchThresholdIf a KDNode has a processor count less or equal this number, all possible decompositions will be investigated, so it influences the quality of the load balancing. I recommend to set it to 2 - 4.

Member Function Documentation

◆ balanceAndExchange()

void KDDecomposition::balanceAndExchange ( double  lastTraversalTime,
bool  forceRebalancing,
ParticleContainer moleculeContainer,
Domain domain 
)
overridevirtual

balance the load (and optimize communication) and exchange boundary particles

This method is used to perform a new decomposition of the global domain with the goal of getting the equal load (and possibly additional goals) on each process. This method is then also responsible for redistributing all particles, so after the method was called, each process has a domain with all particles belonging to this domain (as if exchangeParticles was called after the new decomposition).

Parameters
balanceif true, a rebalancing is forced; otherwise automatic balancing of Decomposition is applied
moleculeContainerneeded for calculating load and to get the particles
domainis e.g. needed to get the size of the local domain

Reimplemented from DomainDecompBase.

◆ finishNonBlockingStage()

void KDDecomposition::finishNonBlockingStage ( bool  forceRebalancing,
ParticleContainer moleculeContainer,
Domain domain,
unsigned int  stageNumber 
)
overridevirtual

Finishes the stageNumber'th stage. This includes receiving the particles, that have to be received in that stage.

Parameters
forceRebalancingtrue if rebalancing should be forced
moleculeContainerpointer to the molecule container
domainpointer to the domain
stageNumberthe number of the stage, the communication is in.

Implements DomainDecompMPIBase.

◆ getBoundingBoxMax()

double KDDecomposition::getBoundingBoxMax ( int  dimension,
Domain domain 
)
overridevirtual
Todo:
comment and thing

Reimplemented from DomainDecompBase.

◆ getBoundingBoxMin()

double KDDecomposition::getBoundingBoxMin ( int  dimension,
Domain domain 
)
overridevirtual
Todo:
comment and thing

Reimplemented from DomainDecompBase.

◆ getNeighbourRanks()

std::vector< int > KDDecomposition::getNeighbourRanks ( )
overridevirtual

Implements DomainDecompMPIBase.

◆ getNeighbourRanksFullShell()

std::vector< int > KDDecomposition::getNeighbourRanksFullShell ( )
overridevirtual

Implements DomainDecompMPIBase.

◆ getNeighboursFromHaloRegion()

std::vector< CommunicationPartner > KDDecomposition::getNeighboursFromHaloRegion ( Domain domain,
const HaloRegion haloRegion,
double  cutoff 
)
overridevirtual

Implements DomainDecompMPIBase.

◆ prepareNonBlockingStage()

void KDDecomposition::prepareNonBlockingStage ( bool  forceRebalancing,
ParticleContainer moleculeContainer,
Domain domain,
unsigned int  stageNumber 
)
overridevirtual

Prepares the stageNumber'th stage. This includes sending particles, that have to be send in that stage.

Parameters
forceRebalancingtrue if rebalancing should be forced
moleculeContainerpointer to the molecule container
domainpointer to the domain
stageNumberthe number of the stage, the communication is in.

Implements DomainDecompMPIBase.

◆ printTree()

void KDDecomposition::printTree ( std::ostream &  ostream)

Prints the tree to the desired ostream.

Parameters
ostream

◆ queryBalanceAndExchangeNonBlocking()

bool KDDecomposition::queryBalanceAndExchangeNonBlocking ( bool  forceRebalancing,
ParticleContainer moleculeContainer,
Domain domain,
double  etime 
)
overridevirtual

Checks whether the balance and exchange step can be performed non-blocking.

A non-blocking behaviour is typically possible, as long as no rebalancing has to be done.

Parameters
forceRebalancingif true, a rebalancing is forced; otherwise automatic balancing of Decomposition is applied
moleculeContainerneeded for calculating load and to get the particles
domainis e.g. needed to get the size of the local domain

Reimplemented from DomainDecompBase.

◆ readXML()

void KDDecomposition::readXML ( XMLfileUnits xmlconfig)
overridevirtual

Read in XML configuration for KDDecomposition and all its included objects.

The following xml object structure is handled by this method:

<parallelisation type="KDDecomposition">
<!-- Indicates the frequency at which the KDD is checked for rebalancing.
Rebalancing is then only performed if the imbalance is above the rebalanceLimit.
Default: 100-->
<updateFrequency>INTEGER</updateFrequency>
<!-- If the imbalance is smaller than the rebalanceLimit then a rebalancing will not be performed.
Default: 0 (i.e., rebalancing will always be performed.-->
<rebalanceLimit>DOUBLE</rebalanceLimit>
<!-- Indicates for how many levels a full search of the possible decompositions shall be performed. High values
increase the time to find a decomposition, but might lead to better decompositions. Above this threshold,
the domain is simply split into two subdomains with equal load.
Default: 2-->
<fullSearchThreshold>INTEGER</fullSearchThreshold>
<!-- Enable the support for heterogeneous compute systems. If this value is true then the performance of the
individual ranks are calculated. The performance measurement uses the FlopCounter and the timing values
for the force calculation.
Default: False-->
<heterogeneousSystems>BOOL</heterogeneousSystems>
<!-- Indicates whether the vectorization tuner should be used. If it is used, the load for cells is measured
depending on the number of particles. Repeated measurements are performed using one or two cells.
If deactivated a simple fall back using a quadratic model is used for the cost model (i.e., Cost =
particle number * particle number). This quadratic model tends to underestimate the cost needed for low
density regions.
Note: Does not work with more than 2 particle types.
Default: False-->
<useVectorizationTuner>BOOL</useVectorizationTuner>
<!-- For vectorization tuner: Generates a file with the measured performance characteristics if enabled.
Default: True-->
<generateNewFiles>BOOL</generateNewFiles>
<!-- For vectorization tuner: Load the file generated via generateNewFiles in a previous run. If no file
exists, the vectorization tuner is run.
Default: True-->
<useExistingFiles>BOOL</useExistingFiles>
<!-- For vectorization tuner: Allows an allreduce for the measured performances. This should be disabled if the
compute resources don't all have the same performance. If disabled the performances are measured for each
process separately, which is (mostly) not what you want. Only for heterogeneous systems, it makes sense to
disable this option.
Default: True-->
<vecTunerAllowMPIReduce>BOOL</vecTunerAllowMPIReduce>
<!-- A cluster version of load balancing. This option is useful if multiple clusters/islands are used that use
different hardware, e.g., one cluster/island using AMD and one cluster/island using Intel processors.
Currently only two clusters are supported. The clusters are identified based on the node names which are
read via MPI_Get_processor_name().
Default: False-->
<clusterHetSys>BOOL</clusterHetSys>
<!-- Option to indicate to always split the domain along the biggest dimension. Splitting along the biggest
dimension creates more cubic subdomains, but might lead to worse overall load balancing. If this is false,
all dimensions are tested and in the end, the one producing the lowest imbalance is chosen.
Default: True-->
<splitBiggestDimension>BOOL</splitBiggestDimension>
<!-- Alternative threshold for splitBiggestDimension. If more than splitThreshold processes are assigned to a
node it is split in only the biggest dimension. splitBiggestDimension overrides this threshold.
Default: disabled(std::numeric_limits<int>::max())-->
<splitThreshold>INTEGER</splitThreshold>
<!-- Option to ignore fullSearchThreshold. If true then the domain is always split into two pieces and less
searching is performed. Searching among the 3 different dimensions still happens, unless
splitBiggestDimension is true.
Default: False-->
<forceRatio>BOOL</forceRatio>
<!-- Indicates whether the MeasureLoad load estimator should be used. See MeasureLoad.
Default: False-->
<doMeasureLoadCalc>BOOL</doMeasureLoadCalc>
<!-- Defines at which index the quadratic interpolation of the measureLoad algorithm starts.
-1 to disable, 0 to always use interpolation.
Default: 1-->
<measureLoadInterpolationStartsAt>INTEGER</measureLoadInterpolationStartsAt>
<!-- Option for MeasureLoad: Forces increasing values for the load estimation (more particles = more load).
Default: True-->
<measureLoadIncreasingTimeValues>BOOL</measureLoadIncreasingTimeValues>
<!-- The reduction operation for the deviation calculation.
Default: sum-->
<deviationReductionOperation>max OR sum</deviationReductionOperation>
<!-- The minimal number of cells in each dimension for a partition. Has to be at least 1.
Increasing this value ensures bigger subdomains (and less overhead when using the full shell method, but
might lead to worse load balance or can make a domain splitting impossible.
Default: 1-->
<minNumCellsPerDimension>UINT</minNumCellsPerDimension>
</parallelisation>

Reimplemented from DomainDecompMPIBase.


The documentation for this class was generated from the following files: