DPN

DPN is a neural solver for min-max vehicle routing problems that decouples partition and navigation tasks using a specialized attention-based encoder and leverages agent-permutation symmetry to enhance solution quality. dpn

DPNInitialization

Bases: ARInitialization

Methods:

  • In training mode, the policy is executed to collect rewards and likelihoods.

  • In evaluation mode, the policy is executed to collect rewards and calculate scores.

NoIteration

Bases: Iteration

DPNPolicy

The DPNPolicy class implements the neural network policy for DPN. It follows a standard encoder-decoder architecture.

Methods:

  • pre_forward(td): This method is called once before the decoding (rollout) process begins. It computes the embeddings for the entire problem instance using the encoder and passes them to the decoder to be used during the decoding steps.

  • forward(td): This method is called at each step of the decoding process. It takes the current state td and uses the decoder to select the next action, returning the selected action and its probability.

  • set_decoder_strategy(strategy): Sets the decoding strategy, such as “sampling” or “greedy”.

DPNEncoder

The DPNEncoder is responsible for generating rich, contextual embeddings for the problem entities (nodes, agents, and depots). It employs a sophisticated architecture that separates the logic for different problem types to handle their unique characteristics.

  • Problem-Specific Architectures: The encoder uses distinct sub-modules for different problems, implementing the “Partition and Navigation” attention mechanism described in the DPN paper:

    • Partition_Navigation_Encoder_TSP: For mTSP.

    • Partition_Navigation_Encoder_PDP: For mPDP.

    • Partition_Navigation_Encoder_MDVRRP: For MDVRP and FMDVRP.

  • Initial Embeddings: It begins by projecting the initial 2D coordinates of nodes, agents, and depots into a higher-dimensional space.

  • Rotary Positional Encoding (RotatePostionalEncoding): It uses rotary positional embeddings (RoPE) to inject relative positional information into the attention mechanism, which is more effective for routing problems than standard sinusoidal encodings.

DPNDecoder

The DPNDecoder autoregressively constructs the solution by selecting the next node for each agent at each step. It uses an attention-based mechanism to decide which node to visit next.

  • Attention-Based Selection: It uses a multi-head attention mechanism to compute a probability distribution over all possible next nodes. The query is constructed from the current step’s context, and the keys and values are derived from the encoder’s output embeddings.

  • Dynamic Context: At each decoding step, it generates a step_context that includes information about the current agent, its current location, the number of remaining agents to dispatch, and the number of unvisited nodes. This context is crucial for making informed decisions.

  • Precomputation and Caching: To improve efficiency, the decoder precomputes and caches fixed components of the attention mechanism (keys, values, and projections) in the set_fixed method before the decoding process starts.

  • Complex Masking: The get_mask method implements problem-specific logic to ensure solution feasibility. It masks out invalid actions, such as visiting an already visited node or returning to the depot prematurely.

Usage

You can run the following command lines to execute the code.

python train.py/eval.py settings=dpn_settings mode={train/test} problem={mtsp/mpdp/mdvrp/fmdvrp} scale={problem size} decoder_strategy={sampling/greedy} settings.env.agent_min={agent_min} settings.env.agent_max={agent_max} settings.env.depot_min={depot_min} settings.env.depot_max={depot_max}

Note: For the setting in mTSP and mPDP, the scale parameter includes the single depot. For MDVRP and FMDVRP, the scale includes all depots.

For more details, the training settings are listed in the tables below. (original paper table5)

Problem

mPDP50 & mTSP50

mPDP100 & mTSP100

Fintune or not

No

No

number of agents($M$)

[2,10]

[2,10]

number of depots($D$)

1

1

number of encoder layers (L)

6

6

learning rate

1.00E-04

1.00E-04

learning rate decay

1

1

batch size

256

256

epoch size

500

500

epochs

256000

256000

number of permutations ($K$)

60

60

Problem mPDP200 & mTSP200 mPDP500 & mTSP500
Fintune or not From 100 From 100
number of agents($M$) [10,20] [30,50]
number of depots($D$) 1 1
number of encoder layers (L) 6 6
learning rate 1.00E-05 1.00E-05
learning rate decay 1 1
batch size 64 16
epoch size 20 20
epochs 64000 16000
number of permutations ($K$) 60 60
Problem MDVRP50 & FMDVRP50 MDVRP100 & FMDVRP100
Fintune or not No No
number of agents($M$) [2,10] [2,10]
number of depots($D$) [2,10] [2,10]
number of encoder layers (L) 6 6
learning rate 1.00E-04 1.00E-04
learning rate decay 1 1
batch size 256 256
epoch size 500 500
epochs 256000 256000
number of permutations ($K$) 60 60
Problem MDVRP50-F & FMDVRP50-F MDVRP100-F & FMDVRP100-F
Fintune or not From 50 From 100
number of agents($M$) [3,7] [5,10]
number of depots($D$) 8 8
number of encoder layers (L) 6 6
learning rate 1.00E-05 1.00E-05
learning rate decay 1 1
batch size 128 128
epoch size 20 20
epochs 128000 128000
number of permutations ($K$) 60 60

Tasks

Supported Tasks: min-max mTSP, min-max mPDP, MDVRP, FMDVRP.

Required Data Generator:

Training

DPN applies autoregressive reinforcement learning class ARREINFORCENARLightning() for training.