gymnasium_envs

gymnasium_envs#

Wrapper for Gymnasium environments.

Classes#

class prt_rl.env.wrappers.gymnasium_envs.GymnasiumWrapper(gym_name: str | None = None, env_factory: Callable[[int, int | None], Env] | None = None, num_envs: int = 1, render_mode: str | None = None, seed: int | None = None, device: str = 'cpu', **kwargs)[source]#

Wraps the Gymnasium environments in the Environment interface.

Parameters:

gym_name – Name of the Gymnasium environment.
env_factory – Callable that constructs a Gymnasium environment for each env index.
num_envs – Number of parallel environments to create.
render_mode – Sets the rendering mode. Defaults to None.

Examples

from prt_rl.env.wrappers import GymnasiumWrapper
from prt_rl.common.policy import RandomPolicy

env = GymnasiumWrapper(
    gym_name="CarRacing-v3",
    render_mode="rgb_array",
    continuous=True
)

# or use a factory (useful for domain randomization wrappers)
env = GymnasiumWrapper(
    env_factory=lambda env_index, seed: gym.make("Pendulum-v1"),
    num_envs=4
)

policy = RandomPolicy(env_params=env.get_parameters())

state, info = env.reset()
done = False

while not done:
    action = policy.get_action(state)
    next_state, reward, done, info = env.step(action)

close()[source]#: Closes the environment and cleans up any resources.

get_num_envs() → int#

Returns the number of environments in the interface.

Returns:: Number of environments
Return type:: int

get_parameters() → EnvParams[source]#: Returns the EnvParams object which contains information about the sizes of observations and actions needed for setting up RL agents. :returns: environment parameters object :rtype: EnvParams

reset(seed: int | None = None) → Tuple[Tensor, Dict[str, Any]][source]#

Resets the environment to the initial state and returns the initial observation. :param seed: Sets the random seed. :type seed: int | None

Returns:: Tuple of tensors containing the initial observation and info dictionary
Return type:: Tuple

reset_index(index: int, seed: int | None = None) → Tuple[Tensor, Dict[str, Any]][source]#

Resets only the environments that are done.

Parameters:: done (torch.Tensor) – Boolean tensor of shape (num_envs, 1) or (num_envs,)
Returns:: The new observations and info dict
Return type:: Tuple[torch.Tensor, Dict[str, Any]]

step(action: Tensor) → Tuple[Tensor, Tensor, Tensor, Dict[str, Any]][source]#

Steps the simulation using the action tensor and returns the new trajectory. :param action: Tensor with “action” key that is a tensor with shape (# env, # actions) :type action: torch.Tensor

Returns:: Tuple of tensors containing the next state, reward, done, and info dictionary
Return type:: Tuple

gymnasium_envs

Contents

gymnasium_envs#

Classes#