interface#

Classes#

EnvParams

Environment parameters contains information about the action and observation spaces to configure RL algorithms.

EnvironmentInterface

The environment interface wraps other simulation environments to provide a consistent interface for the RL library.

MultiAgentEnvParams

Multi-Agent environment parameters contains information about the action and observation spaces to configure multi-agent RL algorithms.

MultiAgentEnvironmentInterface

The multi-agent environment interface wraps other simulation environments to provide a consistent interface for multi-agent RL algorithms.

MultiGroupEnvParams

Multi-group environment parameters extends the Multi-agent parameters to group agents of the same type together.

MultiGroupEnvironmentInterface

The multi-group environment interface wraps other simulation environments to provide a consistent interface for multi-group RL algorithms.

class prt_rl.env.interface.EnvParams(action_len: int, action_continuous: bool | List[bool], action_min: int | float | List[float | int], action_max: int | float | List[float | int], observation_shape: tuple, observation_continuous: bool, observation_min: int | float | List[float], observation_max: int | float | List[float])[source]#

Environment parameters contains information about the action and observation spaces to configure RL algorithms.

Parameters:
  • action_len (int) – Number of actions in action space

  • action_continuous (bool) – True if the actions are continuous or False if they are discrete

  • action_min – Minimum action value. If the actions are discrete this is the minimum integer value, if the actions are continuous it matches the action shape with the minimum value for each action

  • action_max – Maximum action values. If the actions are discrete this is the maximum integer value, if the actions are continuous it matches the action shape with the maximum value for each action

  • observation_shape (tuple) – shape of the observation space

  • observation_continuous (bool) – True if the observations are continuous or False if they are discrete

  • observation_min – Minimum observation value. If the observations are discrete this is the minimum integer value, if the observations are continuous it matches the observation shape with the minimum value for each observation

  • observation_max – Maximum observation value. If the observations are discrete this is the maximum integer value, if the observations are continuous it matches the observation shape with the maximum value for each observation

get_action_max_tensor() Tensor[source]#

Converts action_max to a tensor of shape (action_len, 1). - If action_max is a float, it is broadcast across all actions. - If it is a list, its length must match action_len.

get_action_min_tensor() Tensor[source]#

Converts action_min to a tensor of shape (action_len, 1). - If action_min is a float, it is broadcast across all actions. - If it is a list, its length must match action_len.

class prt_rl.env.interface.EnvironmentInterface(render_mode: str | None = None, num_envs: int = 1)[source]#

The environment interface wraps other simulation environments to provide a consistent interface for the RL library.

The interface for agents is based around tensors and a Gymnasium like API. The main extension to the gym API is the addition of the environment parameters and the ability to put the rgb_array in the info dictionary for rendering.

Single Agent Interface For a single agent step function returns the following structure: next_state, reward, done, info = env.step(action)

The shape of each tensor is (N, M) where N is the number of environments and M is the size of the value. For example, if an agent has two output actions and we are training with four environments then the “action” key will have shape (4,2).

close() None[source]#

Closes the environment and cleans up any resources.

get_num_envs() int[source]#

Returns the number of environments in the interface.

Returns:

Number of environments

Return type:

int

abstractmethod get_parameters() EnvParams[source]#

Returns the EnvParams object which contains information about the sizes of observations and actions needed for setting up RL agents.

Returns:

environment parameters object

Return type:

EnvParams

abstractmethod reset(seed: int | None = None) Tuple[Tensor, Dict[str, Any]][source]#

Resets the environment to the initial state and returns the initial observation.

Parameters:

seed (int | None) – Sets the random seed.

Returns:

Tuple of tensors containing the initial observation and info dictionary

Return type:

Tuple

abstractmethod step(action: Tensor) Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]#

Steps the simulation using the action tensor and returns the new trajectory.

Parameters:

action (torch.Tensor) – Tensor with “action” key that is a tensor with shape (# env, # actions)

Returns:

Tuple of tensors containing the next state, reward, done, and info dictionary

Return type:

Tuple

class prt_rl.env.interface.MultiAgentEnvParams(num_agents: int, agent: EnvParams)[source]#

Multi-Agent environment parameters contains information about the action and observation spaces to configure multi-agent RL algorithms.

Notes

This is still a work in progress.

group = { name: (num_agents, EnvParams) }

class prt_rl.env.interface.MultiAgentEnvironmentInterface(render_mode: str | None = None, num_envs: int = 1)[source]#

The multi-agent environment interface wraps other simulation environments to provide a consistent interface for multi-agent RL algorithms.

The interface for agents is based around tensors and a Gymnasium like API. The main extension to the gym API is the addition of the environment parameters and the ability to put the rgb_array in the info dictionary for rendering.

Multi-Agent Interface For a multi-agent step function returns the following structure: next_state, reward, done, info = env.step(action)

The shape of each tensor is (N, A, M) where N is the number of environments, A is the number of agents, and M is the size of the value. For example, if an agent has two output actions, there are three agents, and we are training with four environments then the “action” key will have shape (4, 3, 2).

close() None[source]#

Closes the environment and cleans up any resources.

get_num_envs() int[source]#

Returns the number of environments in the interface.

Returns:

Number of environments

Return type:

int

abstractmethod get_parameters() MultiAgentEnvParams[source]#

Returns the EnvParams object which contains information about the sizes of observations and actions needed for setting up RL agents.

Returns:

environment parameters object

Return type:

EnvParams

abstractmethod reset(seed: int | None = None) Tuple[Tensor, Dict[str, Any]][source]#

Resets the environment to the initial state and returns the initial observation.

Parameters:

seed (int | None) – Sets the random seed.

Returns:

Tuple of tensors containing the initial observation and info dictionary

Return type:

Tuple

abstractmethod step(action: Tensor) Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]#

Steps the simulation using the action tensor and returns the new trajectory.

Parameters:

action (torch.Tensor) – Tensor with “action” key that is a tensor with shape (# env, # agents, # actions)

Returns:

Tuple of tensors containing the next state, reward, done, and info dictionary

Return type:

Tuple

class prt_rl.env.interface.MultiGroupEnvParams(group: Dict[str, MultiAgentEnvParams])[source]#

Multi-group environment parameters extends the Multi-agent parameters to group agents of the same type together. This allows heterogenous multi-agent teams to be trained together.

class prt_rl.env.interface.MultiGroupEnvironmentInterface(render_mode: str | None = None, num_envs: int = 1)[source]#

The multi-group environment interface wraps other simulation environments to provide a consistent interface for multi-group RL algorithms.

The interface for agents is based around tensors and a Gymnasium like API. The main extension to the gym API is the addition of the environment parameters and the ability to put the rgb_array in the info dictionary for rendering.

Multi-Group Interface For a multi-group step function returns the following structure: next_state, reward, done, info = env.step(action)

The shape of each tensor is (N, G, A, M) where N is the number of environments, G is the number of groups, A is the number of agents in that group, and M is the size of the value. For example, if an agent has two output actions, there are three groups with varying number of agents, and we are training with four environments then the “action” key will have shape (4, G, A, 2).

close() None[source]#

Closes the environment and cleans up any resources.

get_num_envs() int[source]#

Returns the number of environments in the interface.

Returns:

Number of environments

Return type:

int

abstractmethod get_parameters() MultiGroupEnvParams[source]#

Returns the EnvParams object which contains information about the sizes of observations and actions needed for setting up RL agents.

Returns:

environment parameters object

Return type:

EnvParams

abstractmethod reset(seed: int | None = None) Dict[str, Tuple[Tensor, Dict[str, Any]]][source]#

Resets the environment to the initial state and returns the initial observation.

Parameters:

seed (int | None) – Sets the random seed.

Returns:

Tuple of tensors containing the initial observation and info dictionary

Return type:

Tuple

abstractmethod step(action: Dict[str, Tensor]) Dict[str, Tuple[Tensor, Tensor, Tensor, Dict[str, Any]]][source]#

Steps the simulation using the action tensor and returns the new trajectory.

Parameters:

action (torch.Tensor) – Tensor with “action” key that is a tensor with shape (# env, # actions)

Returns:

Tuple of tensors containing the next state, reward, done, and info dictionary

Return type:

Tuple