evaluators

evaluators#

Classes#

Evaluator

Base class for all evaluators in the PRT-RL framework.

NumberOfStepsEvaluator

Evaluator that evaluates the agent's performance to reach a minimum reward threshold within the lowest number of steps.

RewardEvaluator

Evaluators are used to assess the performance of agents or policies.

class prt_rl.common.evaluators.Evaluator(eval_freq: int = 1)[source]#

Base class for all evaluators in the PRT-RL framework. This class provides a common interface for evaluating agents in different environments with different objectives.

Parameters:: eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

Initialize the evaluator with the evaluation frequency.

Parameters:: eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

__init__(eval_freq: int = 1) → None[source]#

Initialize the evaluator with the evaluation frequency.

Parameters:: eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

close() → None[source]#: Close the evaluator and release any resources. This method can be overridden by subclasses if needed.

evaluate(agent, iteration: int, is_last: bool = False) → None[source]#

Evaluate the agent’s performance in the given environment.

Parameters:

agent – The agent to be evaluated.
iteration (int) – The current iteration number.
is_last (bool) – Whether this is the last evaluation.

Returns:

None

class prt_rl.common.evaluators.NumberOfStepsEvaluator(env: EnvironmentInterface, reward_threshold: float, num_episodes: int = 1, logger: Logger | None = None, keep_best: bool = False, eval_freq: int = 1, deterministic: bool = False)[source]#

Evaluator that evaluates the agent’s performance to reach a minimum reward threshold within the lowest number of steps. This evaluator is intended to be used when an agent is able to achieve a maximum desired reward and you want to evaluate which agent learns the fastest.

Parameters:

env (EnvironmentInterface) – The environment to evaluate the agent in.
reward_threshold (float) – The minimum reward threshold to achieve.
num_episodes (int) – The number of episodes to run for evaluation.
logger (Optional[Logger]) – Logger for evaluation metrics.
keep_best (bool) – Whether to keep the best agent based on evaluation performance.
eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.
deterministic (bool) – Whether to use a deterministic policy during evaluation.

Initialize the evaluator with the evaluation frequency.

Parameters:: eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

close() → None[source]#: Close the evaluator and release any resources.

evaluate(policy: Policy, iteration: int, is_last: bool = False) → None[source]#

Evaluate the policy’s performance in the given environment based on timesteps.

Parameters:

policy – The policy to be evaluated.
iteration (int) – The current iteration number.
is_last (bool) – Whether this is the last evaluation.

Returns:

None

get_best_policy() → Policy | None[source]#

Get the best policy based on evaluation performance.

Returns:: The best policy if keep_best is True and a best policy exists, otherwise None.
Return type:: Optional[Policy]

class prt_rl.common.evaluators.RewardEvaluator(env: EnvironmentInterface, num_episodes: int = 1, logger: Logger | None = None, keep_best: bool = False, eval_freq: int = 1, deterministic: bool = False)[source]#

Evaluators are used to assess the performance of agents or policies.

It is important that the eval_freq value is the same units as the iteration value passed to the evaluate method. For example, if the eval_freq is set in steps then num_steps should be used as the iteration value. This ensures the evaluations occur at the correct time.

Parameters:

env (EnvironmentInterface) – The environment to evaluate the agent in.
num_episodes (int) – The number of episodes to run for evaluation.
logger (Optional[Logger]) – Logger for evaluation metrics.
keep_best (bool) – Whether to keep the best agent based on evaluation performance.
eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.
deterministic (bool) – Whether to use a deterministic policy during evaluation.

Initialize the evaluator with the evaluation frequency.

Parameters:: eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

close() → None[source]#: Close the evaluator and release any resources.

evaluate(policy: Policy, iteration: int, is_last: bool = False) → None[source]#

Evaluate the policy’s performance in the given environment.

Parameters:

policy – The policy to be evaluated.
iteration (int) – The current iteration number.
is_last (bool) – Whether this is the last evaluation.

get_best_policy() → Policy | None[source]#

Get the best policy based on evaluation performance.

Returns:: The best policy if keep_best is True and a best policy exists, otherwise None.
Return type:: Optional[Policy]

`Evaluator`	Base class for all evaluators in the PRT-RL framework.
`NumberOfStepsEvaluator`	Evaluator that evaluates the agent's performance to reach a minimum reward threshold within the lowest number of steps.
`RewardEvaluator`	Evaluators are used to assess the performance of agents or policies.

evaluators

Contents

evaluators#

Classes#