• Stable Baselines/用户向导/Tensorboard集成

    Stable Baselines官方文档中文版 Github CSDN 尝试翻译官方文档,水平有限,如有错误万望指正

  • 初阶用法

    RL baselines一起使用Tensorboard,你只需为RL智体简单定义一个log位置即可:

    1. import gym
    2. from stable_baselines import A2C
    3. model = A2C('MlpPolicy', 'CartPole-v1', verbose=1, tensorboard_log="./a2c_cartpole_tensorboard/")
    4. model.learn(total_timesteps=10000)

    或者加载现存模型之后(日志路径默认未保存):

    1. import gym
    2. from stable_baselines.common.vec_env import DummyVecEnv
    3. from stable_baselines import A2C
    4. env = gym.make('CartPole-v1')
    5. env = DummyVecEnv([lambda: env]) # The algorithms require a vectorized environment to run
    6. model = A2C.load("./a2c_cartpole.pkl", env=env, tensorboard_log="./a2c_cartpole_tensorboard/")
    7. model.learn(total_timesteps=10000)

    学习函数一经调用,你可以在训练过程中或训练完成后,通过以下命令监视RL智体:

    1. tensorboard --logdir ./a2c_cartpole_tensorboard/

    也可添加过去的日志文件夹:

    1. tensorboard --logdir ./a2c_cartpole_tensorboard/;./ppo2_cartpole_tensorboard/

    它可展示诸多信息:模型图表、事件奖励、模型损失、观察、其他一些模型的特有参数。

    Stable Baselines/用户向导/Tensorboard集成 - 图1

    Stable Baselines/用户向导/Tensorboard集成 - 图2

    Stable Baselines/用户向导/Tensorboard集成 - 图3

  • 日志更新

    使用回调函数,你用TensorBoard可轻松更新日志。这里是一个如何更新额外张量或随机标量的简单案例:

    ```python import tensorflow as tf import numpy as np

    from stable_baselines import SAC

    model = SAC(“MlpPolicy”, “Pendulum-v0”, tensorboard_log=”/tmp/sac/“, verbose=1)

    Define a new property to avoid global variable

    model.is_tb_set = False

def callback(locals, globals): self = locals[‘self’]

  1. # Log additional tensor
  2. if not self_.is_tb_set:
  3. with self_.graph.as_default():
  4. tf.summary.scalar('value_target', tf.reduce_mean(self_.value_target))
  5. self_.summary = tf.summary.merge_all()
  6. self_.is_tb_set = True
  7. # Log scalar value (here a random variable)
  8. value = np.random.random()
  9. summary = tf.Summary(value=[tf.Summary.Value(tag='random_value', simple_value=value)])
  10. locals_['writer'].add_summary(summary, self_.num_timesteps)
  11. return True

model.learn(50000, callback=callback)

  1. - ## 主干集成
  2. 终端展示的所有信息(默认日志)也可在*tensorboard*展示。为此,你需要定义几个环境变量:
  3. ```python
  4. # formats are comma-separated, but for tensorboard you only need the last one
  5. # stdout -> terminal
  6. export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard'
  7. export OPENAI_LOGDIR=path/to/tensorboard/data

用下述命令配置日志程序:

  1. from stable_baselines.logger import configure
  2. configure()

然后启动tensorboard

  1. tensorboard --logdir=$OPENAI_LOGDIR