Project - Reinforcement Learning - How to make computer learn to play CartPole game

14 / 24

Building the Neural Network

  • The idea is to feed all the values of previous observations as input to the neural network, and output the action for the current step based on previous observation.

  • The observation is nothing but the full game state till the previous step. So let us see if considering the full previous game state could add more smartness to the game.

  • The neural network will estimate the probability of each action based on the input observations of the previous state. We then decide the action to take, based on a random threshold value.

  • It would not be a surprise if you wonder why we plan to pick a random action based on the probability given by the policy network, rather than just picking the action with the highest probability. The intention behind this approach is to let the agent find the right balance between exploring new actions and exploiting the actions that are known to work well. Here's an analogy: suppose you go to a restaurant for the first time, and all the dishes look equally appealing so you randomly pick one. If it turns out to be good, you can increase the probability to order it next time, but you shouldn't increase that probability to 100%, or else you will never try out the other dishes, some of which may be even better than the one you tried.

  • In the case of the Cart-Pole environment, there are just two possible actions (left or right), so we only need one output neuron: it will output the probability p of the action 0 (left), and of course, the probability of action 1 (right) will be 1 - p.

  • Hence the last layer(output layer) of the neural network would have only one neuron, which outputs some probability.


  • tf.keras.layers.Dense : Just your regular densely-connected NN layer. More here.

  • keras.backend.clear_session() : Resets all state generated by Keras.

  • Why do we need clear_session?

    It is useful when you're creating multiple models in successions, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. Eventually, models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.

  • Let us first clear the session:

  • Set the tf and np seeds:

  • Let us set the input shape to be 4, since we shall feed all the values of obs:

     n_inputs = 4
  • Now, let us build the neural network as follows:

    model = keras.models.Sequential([
        keras.layers.Dense(5, activation="elu", input_shape=[n_inputs]),
        keras.layers.Dense(1, activation="sigmoid"),
  • Let us view the summary of model.

See Answer

No hints are availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...