deep q learning cartpole tensoflow code example Example: cartpole dqn reward max is 200 env = gym.envs.make("MountainCar-v0") env._max_episode_steps = 4000