#NoPayJan Offer - Access all CloudxLab Courses for free between 1st to 31st JanEnroll Now >>
play_one_step()function to play multiple episodes, returning all the rewards and gradients, for each episode and each step:
play_multiple_episodes function which takes the
env, n_episodes, n_max_steps, model, loss_fn as input arguments and returns the rewards and gradients for all the steps in each episode in the form of
all_rewards, all_grads respectively.
def play_multiple_episodes(env, n_episodes, n_max_steps, model, loss_fn): all_rewards =  all_grads =  for episode in range(n_episodes): current_rewards =  current_grads =  obs = env.reset() for step in range(n_max_steps): obs, reward, done, grads = play_one_step(env, obs, model, loss_fn) current_rewards.append(reward) current_grads.append(grads) if done: break all_rewards.append(current_rewards) all_grads.append(current_grads) return all_rewards, all_grads
As we could observe, we are basically collecting the lists containing the rewards and gradients for each episode. Rewards for each episode means rewards for each step in that episode. Similar is the case with gradients.
We play one step, get the observations, rewards, gradients for that step, and store them. Again, based on the observations of the previous step, we play the next step. This continues till all the iterations in each episode are complete.
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here