Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
We will be using OpenAI gym, a great toolkit for developing and comparing Reinforcement Learning algorithms.
OpenAI gym provides many environments for our learning agents to interact with.
The algorithm used to solve a Reinforcement Learning problem is represented by an Agent.
We can think of an environment like the one which represents the task or problem to be solved.
An environment is basically a class, which consists of some functions, which we could use.
The reset
function is used to initialize(or reset) the environment as if we are starting it from the very beginning. We use it follows:
obs = env.reset()
Returns:
observation
- the current state of the game, after a step is performed or after it is reset. Observations are environment-dependent values. For cartpole game, it is a 1D NumPy array composed of 4 floats: horizontal position of the cart
velocity of the cart
the angle of the pole
a. 0
means pole is vertical
b. positive(ie., >0) value means that the pole is slanting towards the right.
c.negative(ie., <0) value means that the pole is slanting towards the left.
the angular velocity of the pole
The step
function is used to perform a step, by taking an action
variable and returns four variables. We use it as follows:
obs, reward, done, info = env.step(action)
Input argument:
action
- a number denoting which action to perform.
For example, in CartPole game:
action=0 means left-side, and action=1 means right-side.
Returns:
observation
- the current state of the game, after a step is performed or after it is reset. Observations are environment-dependent values. For cartpole game, it is a 1D NumPy array composed of 4 floats: horizontal position of the cart
velocity of the cart
the angle of the pole
a. 0
means pole is vertical
b. positive(ie., >0) value means that the pole is slanting towards the right.
c.negative(ie., <0) value means that the pole is slanting towards the left.
the angular velocity of the pole
reward
- it is the reward the agent got for its previous step.done
- The sequence of steps between the moment the environment is reset until it is done is called an "episode". This will happen when the pole tilts too much or goes off the screen, or after the last episode (in this last case, you have won). done
is a boolean which is True
at the end of the episode, else done
is False
.info
- this environment-specific dictionary can provide some extra information that may be useful for debugging or for training.Let's start by importing gym
:
import gym
Let's list all the available environments in OpenAI gym:
gym.envs.registry.all()
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...