8 / 24

# Introduction to OpenAI gym environment

• We will be using OpenAI gym, a great toolkit for developing and comparing Reinforcement Learning algorithms.

• OpenAI gym provides many environments for our learning agents to interact with.

• The algorithm used to solve a Reinforcement Learning problem is represented by an Agent.

• We can think of an environment like the one which represents the task or problem to be solved.

• An environment is basically a class, which consists of some functions, which we could use.

• The `reset` function is used to initialize(or reset) the environment as if we are starting it from the very beginning. We use it follows:

``````obs = env.reset()
``````

Returns:

• `observation` - the current state of the game, after a step is performed or after it is reset. Observations are environment-dependent values. For cartpole game, it is a 1D NumPy array composed of 4 floats:
1. horizontal position of the cart

2. velocity of the cart

3. the angle of the pole

a. `0` means pole is vertical

b. positive(ie., >0) value means that the pole is slanting towards the right.

c.negative(ie., <0) value means that the pole is slanting towards the left.

4. the angular velocity of the pole

• The `step` function is used to perform a step, by taking an `action` variable and returns four variables. We use it as follows:

``````obs, reward, done, info = env.step(action)
``````

Input argument:

• `action` - a number denoting which action to perform. For example, in CartPole game:

action=0 means left-side, and action=1 means right-side.

Returns:

• `observation` - the current state of the game, after a step is performed or after it is reset. Observations are environment-dependent values. For cartpole game, it is a 1D NumPy array composed of 4 floats:
1. horizontal position of the cart

2. velocity of the cart

3. the angle of the pole

a. `0` means pole is vertical

b. positive(ie., >0) value means that the pole is slanting towards the right.

c.negative(ie., <0) value means that the pole is slanting towards the left.

4. the angular velocity of the pole

• `reward` - it is the reward the agent got for its previous step.
• `done` - The sequence of steps between the moment the environment is reset until it is done is called an "episode". This will happen when the pole tilts too much or goes off the screen, or after the last episode (in this last case, you have won). `done` is a boolean which is `True` at the end of the episode, else `done` is `False`.
• `info` - this environment-specific dictionary can provide some extra information that may be useful for debugging or for training.
INSTRUCTIONS
• Let's start by importing `gym`:

``````import gym
``````
• Let's list all the available environments in OpenAI gym:

``````gym.envs.registry.all()
``````

No hints are availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here