Project - Reinforcement Learning - How to make computer learn to play CartPole game

22 / 24

Previous Index Next

Defining play_multiple_episodes function

Now let's create another function that will rely on the play_one_step() function to play multiple episodes, returning all the rewards and gradients, for each episode and each step:

INSTRUCTIONS

Define the play_multiple_episodes function which takes the env, n_episodes, n_max_steps, model, loss_fn as input arguments and returns the rewards and gradients for all the steps in each episode in the form of all_rewards, all_grads respectively.
```
def play_multiple_episodes(env, n_episodes, n_max_steps, model, loss_fn):
    all_rewards = []
    all_grads = []
    for episode in range(n_episodes):
        current_rewards = []
        current_grads = []
        obs = env.reset()
        for step in range(n_max_steps):
            obs, reward, done, grads = play_one_step(env, obs, model, loss_fn)
            current_rewards.append(reward)
            current_grads.append(grads)
            if done:
                break
        all_rewards.append(current_rewards)
        all_grads.append(current_grads)
    return all_rewards, all_grads
```
- As we could observe, we are basically collecting the lists containing the rewards and gradients for each episode. Rewards for each episode means rewards for each step in that episode. Similar is the case with gradients.
- We play one step, get the observations, rewards, gradients for that step, and store them. Again, based on the observations of the previous step, we play the next step. This continues till all the iterations in each episode are complete.

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Previous Index Next

Loading comments...