Environments

Env is a module for state-action interaction. Its main functions are to maintain the problem state, process model actions, compute rewards and state transitions, and provide this information to the policy model in a structured form.

Base Env

Different problems are implemented with their own environment classes, but all of them include the following main methods to support the workflow of loading problem data, executing actions within the environment, transitioning to new states, and providing feedback.

Bases: EnvBase

Methods:

  • load_problems: Loads the input problem data to be solved, and performs preprocessing and augmentation based on the configuration.

  • _reset: Resets the environment state and initializes the solution state for a new episode.

  • pre_step: Provides environment information under the current state to support the model’s next decision.

  • _step: Receives the model’s action, performs a step update in the environment, updates the state, checks for termination, and computes the reward.

💡For details on various problems and methods, please refer to their respective descriptions.

  • Routing Problems: TSPEnv, ATSPEnv, PCTSPEnv, MTSPEnv, MDVRPEnv, MPDPEnv, OPEnv, SOPEnv

  • Scheduling Problems: FFSPEnv, RCPSPEnv, SMTWTPEnv

  • Packing Problems: KPEnv, MKPEnv, BPPEnv

DUMMYEnv

Specifically, some problems do not require an environment class. For the sake of framework consistency, we designed a DUMMYEnv class, which does not perform any operations.

Expand to view code of class DUMMYEnv

class DUMMYEnv(EnvBase):
    # batch_locked = False
    def __init__(self,
                 problem_size: int,
                 pomo_size: int = 1,  # multi_start trajectory, in AM-based models, it is 1 by default
                 device: str = 'cpu',
                 seed: int = 2024,
                 **kwargs):
        super().__init__(device=device)
        self.env_name = kwargs.get('env_name')
        self.problem_size = problem_size
        self.pomo_size = pomo_size
        self.device = device
        self.env_name="dummy"
        self.problems = None
        self.selected_count = None
        self.current_node = None
        self.ninf_mask = None
        self.selected_node_list = None
        #self.first_node = None
        self.dummy_flag_bool = None
        self.dummy_flag_long = None
        self.seed_value = seed
        self._set_seed(seed=seed)
        self.method_name = kwargs.get('method_name')
        self.aug_type = kwargs.get('aug_type', None)
        self.aug_factor = kwargs.get('aug_factor',1)
        self.mix_prop = kwargs.get('mix_prop')
        self.aug_flag = self.aug_type is not None and self.aug_factor > 1
        self.distribution=kwargs.get("distribution","uniform")
    def _set_seed(self, seed: Optional[int]):
        rng = torch.manual_seed(seed)
        self.rng = rng
    def _reset(self,td: TensorDict,batch_size=None) -> TensorDict:
        pass
    def _step(self, td: TensorDict) -> TensorDict:
        pass
    def __getstate__(self):
        """Return the state of the environment. By default, we want to avoid pickling
        the random number generator directly as it is not allowed by `deepcopy`
        """
        pass
    def __setstate__(self, state):
        """Set the state of the environment. By default, we want to avoid pickling
        the random number generator directly as it is not allowed by `deepcopy`
        """
        pass