evaluators#

Classes#

Evaluator

Base class for all evaluators in the PRT-RL framework.

NumberOfStepsEvaluator

Evaluator that evaluates the agent's performance to reach a minimum reward threshold within the lowest number of steps.

RewardEvaluator

Evaluators are used to assess the performance of agents or policies.

class prt_rl.common.evaluators.Evaluator(eval_freq: int = 1)[source]#

Base class for all evaluators in the PRT-RL framework. This class provides a common interface for evaluating agents in different environments with different objectives.

Parameters:

eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

Initialize the evaluator with the evaluation frequency.

Parameters:

eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

__init__(eval_freq: int = 1) None[source]#

Initialize the evaluator with the evaluation frequency.

Parameters:

eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

close() None[source]#

Close the evaluator and release any resources. This method can be overridden by subclasses if needed.

evaluate(agent, iteration: int, is_last: bool = False) None[source]#

Evaluate the agent’s performance in the given environment.

Parameters:
  • agent – The agent to be evaluated.

  • iteration (int) – The current iteration number.

  • is_last (bool) – Whether this is the last evaluation.

Returns:

None

class prt_rl.common.evaluators.NumberOfStepsEvaluator(env: EnvironmentInterface, reward_threshold: float, num_episodes: int = 1, logger: Logger | None = None, keep_best: bool = False, eval_freq: int = 1, deterministic: bool = False)[source]#

Evaluator that evaluates the agent’s performance to reach a minimum reward threshold within the lowest number of steps. This evaluator is intended to be used when an agent is able to achieve a maximum desired reward and you want to evaluate which agent learns the fastest.

Parameters:
  • env (EnvironmentInterface) – The environment to evaluate the agent in.

  • reward_threshold (float) – The minimum reward threshold to achieve.

  • num_episodes (int) – The number of episodes to run for evaluation.

  • logger (Optional[Logger]) – Logger for evaluation metrics.

  • keep_best (bool) – Whether to keep the best agent based on evaluation performance.

  • eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

  • deterministic (bool) – Whether to use a deterministic policy during evaluation.

Initialize the evaluator with the evaluation frequency.

Parameters:

eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

close() None[source]#

Close the evaluator and release any resources.

evaluate(policy: Policy, iteration: int, is_last: bool = False) None[source]#

Evaluate the policy’s performance in the given environment based on timesteps.

Parameters:
  • policy – The policy to be evaluated.

  • iteration (int) – The current iteration number.

  • is_last (bool) – Whether this is the last evaluation.

Returns:

None

get_best_policy() Policy | None[source]#

Get the best policy based on evaluation performance.

Returns:

The best policy if keep_best is True and a best policy exists, otherwise None.

Return type:

Optional[Policy]

class prt_rl.common.evaluators.RewardEvaluator(env: EnvironmentInterface, num_episodes: int = 1, logger: Logger | None = None, keep_best: bool = False, eval_freq: int = 1, deterministic: bool = False)[source]#

Evaluators are used to assess the performance of agents or policies.

It is important that the eval_freq value is the same units as the iteration value passed to the evaluate method. For example, if the eval_freq is set in steps then num_steps should be used as the iteration value. This ensures the evaluations occur at the correct time.

Parameters:
  • env (EnvironmentInterface) – The environment to evaluate the agent in.

  • num_episodes (int) – The number of episodes to run for evaluation.

  • logger (Optional[Logger]) – Logger for evaluation metrics.

  • keep_best (bool) – Whether to keep the best agent based on evaluation performance.

  • eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

  • deterministic (bool) – Whether to use a deterministic policy during evaluation.

Initialize the evaluator with the evaluation frequency.

Parameters:

eval_freq (int) – Frequency of evaluation in terms of steps, iterations, or optimization steps.

close() None[source]#

Close the evaluator and release any resources.

evaluate(policy: Policy, iteration: int, is_last: bool = False) None[source]#

Evaluate the policy’s performance in the given environment.

Parameters:
  • policy – The policy to be evaluated.

  • iteration (int) – The current iteration number.

  • is_last (bool) – Whether this is the last evaluation.

get_best_policy() Policy | None[source]#

Get the best policy based on evaluation performance.

Returns:

The best policy if keep_best is True and a best policy exists, otherwise None.

Return type:

Optional[Policy]