Project - Reinforcement Learning - How to make computer learn to play CartPole game

22 / 24

Defining play_multiple_episodes function

  • Now let's create another function that will rely on the play_one_step() function to play multiple episodes, returning all the rewards and gradients, for each episode and each step:
INSTRUCTIONS
  • Define the play_multiple_episodes function which takes the env, n_episodes, n_max_steps, model, loss_fn as input arguments and returns the rewards and gradients for all the steps in each episode in the form of all_rewards, all_grads respectively.

    def play_multiple_episodes(env, n_episodes, n_max_steps, model, loss_fn):
        all_rewards = []
        all_grads = []
        for episode in range(n_episodes):
            current_rewards = []
            current_grads = []
            obs = env.reset()
            for step in range(n_max_steps):
                obs, reward, done, grads = play_one_step(env, obs, model, loss_fn)
                current_rewards.append(reward)
                current_grads.append(grads)
                if done:
                    break
            all_rewards.append(current_rewards)
            all_grads.append(current_grads)
        return all_rewards, all_grads
    
    • As we could observe, we are basically collecting the lists containing the rewards and gradients for each episode. Rewards for each episode means rewards for each step in that episode. Similar is the case with gradients.

    • We play one step, get the observations, rewards, gradients for that step, and store them. Again, based on the observations of the previous step, we play the next step. This continues till all the iterations in each episode are complete.

See Answer

No hints are availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...