Login using Social Account
     Continue with GoogleLogin using your credentials
play_one_step()
function to play multiple episodes, returning all the rewards and gradients, for each episode and each step:Define the play_multiple_episodes
function which takes the env, n_episodes, n_max_steps, model, loss_fn
as input arguments and returns the rewards and gradients for all the steps in each episode in the form of all_rewards, all_grads
respectively.
def play_multiple_episodes(env, n_episodes, n_max_steps, model, loss_fn):
all_rewards = []
all_grads = []
for episode in range(n_episodes):
current_rewards = []
current_grads = []
obs = env.reset()
for step in range(n_max_steps):
obs, reward, done, grads = play_one_step(env, obs, model, loss_fn)
current_rewards.append(reward)
current_grads.append(grads)
if done:
break
all_rewards.append(current_rewards)
all_grads.append(current_grads)
return all_rewards, all_grads
As we could observe, we are basically collecting the lists containing the rewards and gradients for each episode. Rewards for each episode means rewards for each step in that episode. Similar is the case with gradients.
We play one step, get the observations, rewards, gradients for that step, and store them. Again, based on the observations of the previous step, we play the next step. This continues till all the iterations in each episode are complete.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...