CartPole | Various DQN method with cartpole

by yanpanlau Python Version: Current License: No License

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | CartPole Summary

CartPole is a Python library. CartPole has no bugs, it has no vulnerabilities and it has low support. However CartPole build file is not available. You can download it from GitHub.

CartPole

Support

Quality

Security

License

Reuse

Support

CartPole has a low active ecosystem.

It has 8 star(s) with 0 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

There are 2 open issues and 0 have been closed. On average issues are closed in 646 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of CartPole is current.

Quality

CartPole has no bugs reported.

Security

CartPole has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

CartPole does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

CartPole releases are not available. You will need to build from source code and install.

CartPole has no build file. You will be need to create the build yourself to build the component from source.

Top functions reviewed by kandi - BETA

kandi has reviewed CartPole and discovered the below as its top functions. This is intended to give you an instant insight into CartPole implemented functionality, and help decide if they suit your requirements.

Build the model
Sample noise
Train the trained model
Returns the action corresponding to the given state
Add memory to memory
Loads a model
Saves weights to the model
Update the target weights

Get all kandi verified functions for this library.

CartPole Key Features

No Key Features are available at this moment for CartPole.

CartPole Examples and Code Snippets

Quick Start

pypi

Lines of Code : 42

License : No License

Copy

import gym, torch, numpy as np, torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
import tianshou as ts


task = 'CartPole-v0'
lr, epoch, batch_size = 1e-3, 10, 64
train_num, test_num = 10, 100
gamma, n_step, target_freq = 0.9, 3, 320

Runs a cartpole - v0 .

python

Lines of Code : 35

License : No License

Copy

def main():
  env = gym.make('CartPole-v0')
  D = env.observation_space.shape[0]
  K = env.action_space.n
  pmodel = PolicyModel(D, K, [])
  vmodel = ValueModel(D, [10])
  init = tf.global_variables_initializer()
  session = tf.InteractiveSession()

Play a CartPole - v0 .

python

Lines of Code : 30

License : No License

Copy

def main():
  env = gym.make('CartPole-v0')
  D = env.observation_space.shape[0]
  K = env.action_space.n
  pmodel = PolicyModel(D, K, [])
  vmodel = ValueModel(D, [10])
  gamma = 0.99

  if 'monitor' in sys.argv:
    filename = os.path.basename(__fi

Community Discussions

Trending Discussions on CartPole

The DQN model cannot correctly come out the expected scores

Keras: AttributeError: 'Adam' object has no attribute '_name'

Using BatchedPyEnvironment in tf_agents

gym package not identifying ten-armed-bandits-v0 env

DQN predicts same action value for every state (cart pole)

Policy Network returning different outputs for batched states and individual states

tf_agents doesn't properly learn a simple environment

Real-To-Sim in Drake: What's The Proper Way to Get Real-World Sensor Data Into a Drake MultibodyPlant

Dense layer give me 2D array

Confusion About Implementing LeafSystem With Vector Output Port Correctly

QUESTION

The DQN model cannot correctly come out the expected scores

Asked 2022-Apr-17 at 15:08

I am working on a DQN training model of the game "CartPole-v1". In this model, the system did not remind any error information in the terminal. However, The result evaluation got worse.This is the output data:

...

ANSWER

Answered 2022-Apr-17 at 15:08

Check out the code. For most parts it's the same as in snippet above, but there is some changes:

for step in replay buffer (which is called in code memory_store) namedtuple is used, and in update it's much easier to read t.reward, than looking what every index doing in step t
class DQN has method update, it's better to keep optimizer as attribute of class, than create it every time when calling function backprbgt
usage of torch.autograd.Variable here is unnecessary, so it's also was taken away
update in backprbgt taken per batch
decrease size of hidden layer from 360 to 32, while increase batch size from 40 to 128
updating network once in 10 episodes, but on 10 batches in replay buffer
average score prints out every 50 episodes based on 10 last episodes
add seeds

Also for RL it's take a long time to learn anything, so hoping that after 100 episodes it'll be close to even 100 points is somewhat optimistic. For the code in link averaging on 5 runs results in following dynamics

X axis -- number of episodes (yeah, 70 K, but it's like 20 minutes of real time)

Y axis -- number of steps in episode

As can be seen after 70K episodes algorithm achieves reward comparable to highest possible in this environment (highest -- 500). By tweaking hyperparameters faster rate can be achieved, but also remember it's DQN without any modification.

Source https://stackoverflow.com/questions/71897010

QUESTION

Keras: AttributeError: 'Adam' object has no attribute '_name'

Asked 2022-Apr-16 at 15:05

I want to compile my DQN Agent but I get error: AttributeError: 'Adam' object has no attribute '_name',

...

ANSWER

Answered 2022-Apr-16 at 15:05

Your error came from importing Adam with from keras.optimizer_v1 import Adam, You can solve your problem with tf.keras.optimizers.Adam from TensorFlow >= v2 like below:

(The lr argument is deprecated, it's better to use learning_rate instead.)

Source https://stackoverflow.com/questions/71894769

QUESTION

Using BatchedPyEnvironment in tf_agents

Asked 2022-Feb-19 at 18:11

I am trying to create a batched environment version of an SAC agent example from the Tensorflow Agents library, the original code can be found here. I am also using a custom environment.

I am pursuing a batched environment setup in order to better leverage GPU resources in order to speed up training. My understanding is that by passing batches of trajectories to the GPU, there will be less overhead incurred when passing data from the host (CPU) to the device (GPU).

My custom environment is called SacEnv, and I attempt to create a batched environment like so:

...

ANSWER

Answered 2022-Feb-19 at 18:11

It turns out I neglected to pass batch_size when initializing the AverageReturnMetric and AverageEpisodeLengthMetric instances.

Source https://stackoverflow.com/questions/71168412

QUESTION

gym package not identifying ten-armed-bandits-v0 env

Asked 2022-Feb-08 at 08:01

Environment:

Python: 3.9
OS: Windows 10

When I try to create the ten armed bandits environment using the following code the error is thrown not sure of the reason.

...

ANSWER

Answered 2022-Feb-08 at 08:01

It could be a problem with your Python version: k-armed-bandits library was made 4 years ago, when Python 3.9 didn't exist. Besides this, the configuration files in the repo indicates that the Python version is 2.7 (not 3.9).

If you create an environment with Python 2.7 and follow the setup instructions it works correctly on Windows:

Source https://stackoverflow.com/questions/70858340

QUESTION

DQN predicts same action value for every state (cart pole)

Asked 2021-Dec-22 at 15:55

I'm trying to implement a DQN. As a warm up I want to solve CartPole-v0 with a MLP consisting of two hidden layers along with input and output layers. The input is a 4 element array [cart position, cart velocity, pole angle, pole angular velocity] and output is an action value for each action (left or right). I am not exactly implementing a DQN from the "Playing Atari with DRL" paper (no frame stacking for inputs etc). I also made a few non standard choices like putting done and the target network prediction of action value in the experience replay, but those choices shouldn't affect learning.

In any case I'm having a lot of trouble getting the thing to work. No matter how long I train the agent it keeps predicting a higher value for one action over another, for example Q(s, Right)> Q(s, Left) for all states s. Below is my learning code, my network definition, and some results I get from training

...

ANSWER

Answered 2021-Dec-19 at 16:09

There was nothing wrong with the network definition. It turns out the learning rate was too high and reducing it 0.00025 (as in the original Nature paper introducing the DQN) led to an agent which can solve CartPole-v0.

That said, the learning algorithm was incorrect. In particular I was using the wrong target action-value predictions. Note the algorithm laid out above does not use the most recent version of the target network to make predictions. This leads to poor results as training progresses because the agent is learning based on stale target data. The way to fix this is to just put (s, a, r, s', done) into the replay memory and then make target predictions using the most up to date version of the target network when sampling a mini batch. See the code below for an updated learning loop.

Source https://stackoverflow.com/questions/70382999

QUESTION

Policy Network returning different outputs for batched states and individual states

Asked 2021-Nov-27 at 22:19

I am implementing REINFORCE applied to the CartPole-V0 openAI gym environment. I am trying 2 different implementations of the same, and the issue I am not able to resolve is the following:

Upon passing a single state to the Policy Network, I get an output Tensor of size 2, containing the action probabilities of the 2 actions. However, when I pass a `batch of states' to the Policy Network to compute the output action probabilities of all of them, the values that I obtain are very different from when each state is individually passed to the network.

Can someone help me understand the issue?

My code for the same is below: (Note: this is NOT the complete REINFORCE algorithm -- I am aware that I need to compute the loss from the probabilities. But I am trying to understand the difference in the computation of the two probabilities, which I think should be the same, before proceeding.)

...

ANSWER

Answered 2021-Nov-27 at 08:21

In your policy, you have Softmax over dim 0. This normalizes the probability of each action across your batch. You want to do it across actions by dim=1.

Source https://stackoverflow.com/questions/70131381

QUESTION

tf_agents doesn't properly learn a simple environment

Asked 2021-Oct-16 at 23:56

I successfully followed this official tensorflow tutorial for training an agent to solve the 'CartPole-v0' gym environment. I only diverged from the tutorial in that I did not use reverb, because it's not supported on Windows. I tried to modify the example to train the agent to solve my own (extremely simple) environment, but it fails to converge on a solution after 10,000 iterations, which I feel should be more than plenty.

I tried adjusting training iterations, learning rates, batch sizes, discounts, and everything else I could think of. Nothing had an effect on the result.

I would like the agent to converge on a policy that always gets +1 reward (ideally in only a few hundred iterations, since this environment is so extremely simple), instead of one that occasionally dips to -1. Instead, here's a graph of the actual outcome:

(The text is small so I will say that orange is episode length in steps, and blue is the average reward. The X axis is the number of training iterations, from 0 to 10,000.)

CODE Everything here is run top to bottom, but I put it in separate code blocks to make it easier to read/debug.

Imports

...

ANSWER

Answered 2021-Oct-16 at 23:56

The cause of the issue was that the agent had no incentive to quickly solve the problem, because going to the right after 10 steps and after 3 steps both result in equal reward. Because the step counter was not observed, the agent could not possibly correlate taking too long with losing; so it would occasionally take more then 10 steps, lose, and be unable to learn from the experience.

I solved this by giving a -0.1 reward on every step, which incentivized the agent to solve the environment in as few steps as possible (causing it to never break the 10 step loss rule).

I also sped up the learning process by increasing the epsilon_greedy parameter of the DqnAgent's constructor to 0.5 (from it's default of 0.1) to allow it to more quickly explore the entire environment.

Source https://stackoverflow.com/questions/69593105

QUESTION

Real-To-Sim in Drake: What's The Proper Way to Get Real-World Sensor Data Into a Drake MultibodyPlant

Asked 2021-Sep-07 at 01:54

I've been reading through the Drake docs and any tutorials I found, but I've yet to see any detailed information about this problem/goal, so I thought I'd try to ask here.

Goal:
My current project is that I have a real-world furuta pendulum with sensors to obtain the cartpole equivalent of state vector q_sim = [x, theta, xdot, thetadot].
I want to be able have a real-world state-vector, q_real, that "overwrites" the simulated cartpole state-vector, q_sim, (acquired from MBP derived from Parser(plant=cart_pole).AddModelFromFile(sdf_path) ). So that the resulting behavior would be something like this: moving the real-world pendulum upright would be reflected (in real time) in the simulation pendulum moving upright. Eventually my goal would be able to have the furuta pendulum do a swing-up and balance behavior with the cartpole simulation reflecting the state vector, q_real, in real time.

What I've Thought About/Tried:

Extending leafsystem to read the real-world sensors and return a state vector as if it was a modeled dynamical system, with its own actuator_port for the controller output. My problem with this approach was that I wasn't sure/know if it was possible to do this whilst still visualizing a cart_pole MBP with the same state vector. I came to this dead-end through my reading of the MultiBodyPlant doc [ https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_multibody_plant.html#details ]. I did not see any function to be able to replace/overwrite the simulation MBP's state_output_vector.
I also considered the possibility of having two separate MBPs, one being the cart_pole derived from an .sdf file and another being an extended leafsystem that takes an actuator input, read sensors, and return a state vector output. However, with this setup I don't believe I am able to achieve the desired behavior of "moving the real-world pendulum upright and have the simulation pendulum move upright in real time"

Extra:
I also noticed that there's an Issue open on the drake github https://github.com/RobotLocomotion/drake/issues/12912about an official tutorial on drake's MBP from 2020. Does anyone know if there's any update to that tutorial beyond the Doxygen?

...

ANSWER

Answered 2021-Sep-07 at 01:54

I don't think you want to overwrite your MBP plant with the data from simulation. The normal workflow would be to offer a different system that reads from your sensors and offers the same ports that your MBP would have offered.

In the extreme, you can even wrap all of your sensor and actuator drivers up into a system of their own that acts like a "mock" of the MBP. That's the pattern I offered in the ManipulationStation and ManipulationStationHardwareInterface. In addition to the doxygen, you could look here: https://manipulation.csail.mit.edu/robot.html#section4

Source https://stackoverflow.com/questions/69081307

QUESTION

Dense layer give me 2D array

Asked 2021-Aug-30 at 06:13

I am making reinforcement learning for CartPole and i meet this problem

...

ANSWER

Answered 2021-Aug-30 at 06:02

You have given 4 inputs and for these 4 inputs the model is predicting 4 outputs. As your output layer has 2 neurons, hence, each of the 4 outputs has 2 values. It seems to be everything fine. And the output shape is (4, 2) (not (2, 4)).

If you are thinking how it is counted as (4, 2), therefore: To find the shape of a tensor manually start from left side and now if you enter inside a single [, you will find 4 1 dimensional tensors, therefore similarly accessing inside any of these tensors you will find again 2 0 dimensional tensors (i.e, scalars). As you have reached up to 0 dimensional tensor, now stop this process. This is how it is (4, 2).

Source https://stackoverflow.com/questions/68978463

QUESTION

Confusion About Implementing LeafSystem With Vector Output Port Correctly

Asked 2021-Aug-29 at 09:02

I'm a student teaching myself Drake, specifically pydrake with Dr. Russ Tedrake's excellent Underactuated Robotics course. I am trying to write a combined energy shaping and lqr controller for keeping a cartpole system balanced upright. I based the diagram on the cartpole example found in Chapter 3 of Underactuated Robotics [http://underactuated.mit.edu/acrobot.html], and the SwingUpAndBalanceController on Chapter 2: [http://underactuated.mit.edu/pend.html].

I have found that due to my use of the cart_pole.sdf model I have to create an abstract input port due receive FramePoseVector from the cart_pole.get_output_port(0). From there I know that I have to create a control signal output of type BasicVector to feed into a Saturation block before feeding into the cartpole's actuation port.

The problem I'm encountering right now is that I'm not sure how to get the system's current state data in the DeclareVectorOutputPort's callback function. I was under the assumption I would use the LeafContext parameter in the callback function, OutputControlSignal, obtaining the BasicVector continuous state vector. However, this resulting vector, x_bar is always NaN. Out of desperation (and testing to make sure the rest of my program worked) I set x_bar to the controller's initialization cart_pole_context and have found that the simulation runs with a control signal of 0.0 (as expected). I can also set output to 100 and the cartpole simulation just flies off into endless space (as expected).

TL;DR: What is the proper way to obtain the continuous state vector in a custom controller extending LeafSystem with a DeclareVectorOutputPort?

Thank you for any help! I really appreciate it :) I've been teaching myself so it's been a little arduous haha.

...

ANSWER

Answered 2021-Aug-29 at 09:02

Here are two things that might help:

If you want to get the state of the cart-pole from MultibodyPlant, you probably want to be connecting to the continuous_state output port, which gives you a normal vector instead of the abstract-type FramePoseVector. In that case, your call to get_input_port().Eval(context) should work just fine.
If you do really want to read the FramePoseVector, then you have to evaluate the input port slightly differently. You can find an example of that here.

Source https://stackoverflow.com/questions/68969628

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install CartPole

You can download it from GitHub.
You can use CartPole like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: