cartpole | OpenAI 's cartpole env solver | Machine Learning library
kandi X-RAY | cartpole Summary
kandi X-RAY | cartpole Summary
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center. source.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Saves the results to a PNG file .
- Do a cart pole .
- Add a score .
- Perform an experience replay .
- Initialize the model .
- Save the score to a csv file .
- Return the action corresponding to the given state .
- Store a context manager .
cartpole Key Features
cartpole Examples and Code Snippets
import gym, torch, numpy as np, torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
import tianshou as ts
task = 'CartPole-v0'
lr, epoch, batch_size = 1e-3, 10, 64
train_num, test_num = 10, 100
gamma, n_step, target_freq = 0.9, 3, 320
def main():
env = gym.make('CartPole-v0')
D = env.observation_space.shape[0]
K = env.action_space.n
pmodel = PolicyModel(D, K, [])
vmodel = ValueModel(D, [10])
init = tf.global_variables_initializer()
session = tf.InteractiveSession()
def main():
env = gym.make('CartPole-v0')
D = env.observation_space.shape[0]
K = env.action_space.n
pmodel = PolicyModel(D, K, [])
vmodel = ValueModel(D, [10])
gamma = 0.99
if 'monitor' in sys.argv:
filename = os.path.basename(__fi
Community Discussions
Trending Discussions on cartpole
QUESTION
I am implementing simple DQN algorithm using pytorch
, to solve the CartPole environment from gym
. I have been debugging for a while now, and I cant figure out why the model is not learning.
Observations:
- using
SmoothL1Loss
performs worse thanMSEloss
, but loss increases for both - smaller
LR
inAdam
does not work, I have tested using 0.0001, 0.00025, 0.0005 and default
Notes:
- I have debugged various parts of the algorithm individually, and can say with good confidence that the issue is in the
learn
function. I am wondering if this bug is due to me misunderstandingdetach
in pytorch or some other framework mistake im making. - I am trying to stick as close to the original paper as possible (linked above)
References:
...ANSWER
Answered 2021-Jun-02 at 17:39The main problem I think is the discount factor, gamma. You are setting it to 1.0, which mean that you are giving the same weight to the future rewards as the current one. Usually in reinforcement learning we care more about the immediate reward than the future, so gamma should always be less than 1.
Just to give it a try I set gamma = 0.99
and run your code:
QUESTION
I have a concern in understanding the Cartpole code as an example for Deep Q Learning. The DQL Agent part of the code as follow:
...ANSWER
Answered 2021-May-31 at 22:21self.model.predict(state)
will return a tensor of shape of (1, 2) containing the estimated Q values for each action (in cartpole the action space is {0,1}).
As you know the Q value is a measure of the expected reward.
By setting self.model.predict(state)[0][action] = target
(where target is the expected sum of rewards) it is creating a target Q value on which to train the model. By then calling model.fit(state, train_target)
it is using the target Q value to train said model to approximate better Q values for each state.
I don't understand why you are saying that the loss becomes 0: the target is set to the discounted sum of rewards plus the current reward
QUESTION
I am making a comparison between both kind of algorithms against the CartPole environment. Having the imports as:
...ANSWER
Answered 2021-Feb-11 at 18:29The A2C code fails due to the configuration you copied from the PPO trial: "sgd_minibatch_size", "kl_coeff" and many others are PPO-specific configs, which cause the problem when running using A2C.
The error is explained in the "error.txt" in the logdir.
QUESTION
I apologize in advance for the question in the title not being very clear. I'm trying to train a reinforcement learning policy using tf-agents in which there exists some unobservable stochastic variable that affects the state.
For example, consider the standard CartPole problem, but we add wind where the velocity changes over time. I don't want to train an agent that relies on having observed the wind velocity at each step; I instead want the wind to affect the position and angular velocity of the pole, and the agent to learn to adapt just as it would in the wind-free environment. In this example however, we would need the wind velocity at the current time to be correlated with the wind velocity at the previous time e.g. we wouldn't want the wind velocity to change from 10m/s at time t to -10m/s at time t+1.
The problem I'm trying to solve is how to track the state of the exogenous variable without making it part of the observation spec that gets fed into the neural network when training the agent. Any guidance would be appreciated.
...ANSWER
Answered 2021-Jan-13 at 08:17Yes, that is no problem at all. Your environment object (a subclass of PyEnvironment
or TFEnvironment
) can do whatever you want within it. The observation_spec
requirement is only related to the TimeStep that you output in the step
and reset
methods (more precisely in your implementation of the _step
and _reset
abstract methods).
Your environment however is completely free to have any additional attributes that you might want (like parameters to control wind generation) and any number of additional methods you like (like methods to generate the wind at this timestep according to self._wind_hyper_params
). A quick schematic of your code would look like is below:
QUESTION
I implemented a simple actor-critic model in Tensorflow==2.3.1 to learn Cartpole environment. But it is not learning at all. The average scores of every 50 episodes is below 20. Can someone please point out why the model isn't learning?
I based my algorithm on the following pseudocode:
...ANSWER
Answered 2020-Dec-02 at 13:11I was able to fix your code. Main changes:
- replace
math.log()
withtfp.distributions.Categorical.log_prob()
- change error calculation method
But I'm not entirely sure why it works this way, so further clarification is appreciated.
QUESTION
I want to extend the PPO agent class in ChainerRL. I did the following:
...ANSWER
Answered 2020-Nov-25 at 11:50action = super().act_and_train(obs, reward)
QUESTION
I've created a diagram with an LQR Controller, a MultiBodyPlant, a scenegraph, and a PlanarSceneGraphVisualizer.
While trying to run this simulation, I set the random initial conditions using the function: context.SetDiscreteState(randInitState)
. However, with this, I get the following error:
RuntimeError: Context::SetDiscreteState(): expected exactly 1 discrete state group but there were 2 groups. Use the other signature if you have multiple groups.
And indeed when I check the number of groups using context.num_discrete_state_groups()
, it returns 2. So, then I have to specify the group index while setting the state using the command context.SetDiscreteState(0, randInitState)
. This works but I don't exactly know why. I understand that I have to select a correct group to set the state for but what exactly is a group here? In the cartpole example given here, the context was set using context.SetContinuousState(UprightState() + 0.1 * np.random.randn(4,))
without specifying any group(s).
Are groups only valid for discrete systems? The context documentation talks about groups and but doesn't define them.
Is there a place to find the definition of what a group is while setting up a drake simulation with multiple systems inside a diagram and how to check the group index of a system?
...ANSWER
Answered 2020-Nov-24 at 17:57We would typically recommend that you use a workflow that sets the context using a subsystem interface. E.g.
QUESTION
I would like to record multiple meshcat visualizers with different prefix ids so I can eventually save them to an HTML file that I can interact with later. I would like the different visualizations to play on top of each other and to be able to select/unselect different visualizations.
Here is a minimal replication of the problem. I expect there to be two cartpole visualizations that are on top of each other. However, I only end up seeing one of them being recorded. The other cartpole seems stuck at the end position when I play back the recording
...ANSWER
Answered 2020-Nov-11 at 02:41Here is one solution. I don't love it, but it accomplishes the stated goal. Note the changes:
- saving and publishing only one animation
- naming the model in the
MultibodyPlant
with a distinct name for each sim - setting
delete_prefix_on_load=False
so it doesn't clear the old geometry
QUESTION
Here is my implementation of DQN and DDQN for CartPole-v0 which I think is correct.
...ANSWER
Answered 2020-Nov-09 at 16:36so one mistake in your implementation is that you never add the end of an episode to your replay buffer. In your train function you return if sign==1 (end of the episode). Remove that return and adjust the target calculation via (1-dones)*... in case you sample a transition of the end of an episode. The reason why the end of the episode is important is that it is the only experience is where the target is not approximated via bootstrapping. Then DQN trains. For reproducibility I used a discount rate of 0.99 and the seed 2020 (for torch, numpy and the gym environment). I achieved a reward of 199.100 after 241 episodes of training.
Hope that helps, code is very readable btw.
QUESTION
I have made a small script in Python to solve various Gym environments with policy gradients.
...ANSWER
Answered 2020-Sep-09 at 06:58The loss here depends on what output on each problem. Generaly, loss for backpropagate should be a number that represents for everything you have processed. For policy gradient, it will be the reward that it think it will get compare with the original reward, the log is just a way to bring it back to a probabily random variable. Single dimension. If you want to inspect the behavior behind codes, you should always check the shape/dimension between each process to fully understand
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install cartpole
You can use cartpole like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page