LIH

LIH is a combinatorial optimization framework that combines deep reinforcement learning with a self-attention-based architecture, leveraging learned improvement heuristics and policy network-guided local search to derive high-quality solutions for routing problems.

lih

LIHInitialization

Bases: Initialization

Methods: Initialization_tool

  • The Initialization_tool is used to generate initial solutions and calculate the length of these solutions. For the TSP (Traveling Salesman Problem), initial solutions are generated via the random initial method; for the CVRP (Capacitated Vehicle Routing Problem), initial solutions are generated using the nearest insertion method.

LIHIteration

Bases: Iteration

Methods:

  • 2-opt: It improves the solution with the 2-opt operator under given iterations.

LIHPolicy

  • class LIHPolicy:

    def __init__(
        self,
        env_name: str = "tsp",  
        embed_dim: int = 128,
        init_embedding: nn.Module = None,
        num_heads: int = 1,
        qkv_dim: int = 128,
        num_encoder_layers: int = 3,
        normalization: str = "batch", 
        feedforward_hidden: int = 512,
        **kwargs
    )

LIHModel

LIH is based on the Transformer architecture. In the encoder, it employs single-head attention, while in the decoder, full-graph maxpooling is added as supplementary information.

Usage

You can run the following command lines to execute the code.

python train.py settings=lih_settings mode=train problem={problem} settings.module.T={T} settings.module.n_step_return={n_step_return}
python train.py settings=lih_settings mode=train problem=tsp
python train.py settings=lih_settings mode=train problem=tsp settings.module.T=200 settings.module.n_step_return=4
problem scale T n_step_return
tsp 20 200 4
50 200 4
100 200 4
cvrp 20 360 10
50 480 12
100 480 12

Tasks

Supported Tasks: TSP, CVRP

Required Data Generator:

Policy

main loop

  • LIH starts by randomly generating an initial solution. The neural network then selects a pair of nodes for the 2-opt operation to improve the solution iteratively.

  • In the CVRP problem, each node is represented by a 7-dimensional vector, formed by concatenating the previous node $x_{t-1}$, the current node $x_t$, the next node $x_{t+1}$, and the demand.

Training

LIH applies autoregressive reinforcement learning class ARREINFORCELightning() for training.

training paradigm

  • TSP is trained using the A2C algorithm.

  • CVRP is trained using the PPO algorithm

  • Both of the training processes apply multi-step discounted rewards to improve stability and learning efficiency.