cartpole | OpenAI 's cartpole env solver | Machine Learning library

 by   gsurma Python Version: Current License: MIT

kandi X-RAY | cartpole Summary

kandi X-RAY | cartpole Summary

cartpole is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning, Pytorch applications. cartpole has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can download it from GitHub.

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center. source.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              cartpole has a low active ecosystem.
              It has 137 star(s) with 111 fork(s). There are 10 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 2 open issues and 2 have been closed. There are 2 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of cartpole is current.

            kandi-Quality Quality

              cartpole has 0 bugs and 0 code smells.

            kandi-Security Security

              cartpole has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              cartpole code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              cartpole is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              cartpole releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              cartpole saves you 60 person hours of effort in developing the same functionality from scratch.
              It has 156 lines of code, 9 functions and 3 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed cartpole and discovered the below as its top functions. This is intended to give you an instant insight into cartpole implemented functionality, and help decide if they suit your requirements.
            • Saves the results to a PNG file .
            • Do a cart pole .
            • Add a score .
            • Perform an experience replay .
            • Initialize the model .
            • Save the score to a csv file .
            • Return the action corresponding to the given state .
            • Store a context manager .
            Get all kandi verified functions for this library.

            cartpole Key Features

            No Key Features are available at this moment for cartpole.

            cartpole Examples and Code Snippets

            Quick Start
            pypidot img1Lines of Code : 42dot img1no licencesLicense : No License
            copy iconCopy
            import gym, torch, numpy as np, torch.nn as nn
            from torch.utils.tensorboard import SummaryWriter
            import tianshou as ts
            
            
            task = 'CartPole-v0'
            lr, epoch, batch_size = 1e-3, 10, 64
            train_num, test_num = 10, 100
            gamma, n_step, target_freq = 0.9, 3, 320
              
            Runs a cartpole - v0 .
            pythondot img2Lines of Code : 35dot img2no licencesLicense : No License
            copy iconCopy
            def main():
              env = gym.make('CartPole-v0')
              D = env.observation_space.shape[0]
              K = env.action_space.n
              pmodel = PolicyModel(D, K, [])
              vmodel = ValueModel(D, [10])
              init = tf.global_variables_initializer()
              session = tf.InteractiveSession()
               
            Play a CartPole - v0 .
            pythondot img3Lines of Code : 30dot img3no licencesLicense : No License
            copy iconCopy
            def main():
              env = gym.make('CartPole-v0')
              D = env.observation_space.shape[0]
              K = env.action_space.n
              pmodel = PolicyModel(D, K, [])
              vmodel = ValueModel(D, [10])
              gamma = 0.99
            
              if 'monitor' in sys.argv:
                filename = os.path.basename(__fi  

            Community Discussions

            QUESTION

            DQN Pytorch Loss keeps increasing
            Asked 2021-Jun-02 at 17:39

            I am implementing simple DQN algorithm using pytorch, to solve the CartPole environment from gym. I have been debugging for a while now, and I cant figure out why the model is not learning.

            Observations:

            • using SmoothL1Loss performs worse than MSEloss, but loss increases for both
            • smaller LR in Adam does not work, I have tested using 0.0001, 0.00025, 0.0005 and default

            Notes:

            • I have debugged various parts of the algorithm individually, and can say with good confidence that the issue is in the learn function. I am wondering if this bug is due to me misunderstanding detach in pytorch or some other framework mistake im making.
            • I am trying to stick as close to the original paper as possible (linked above)

            References:

            ...

            ANSWER

            Answered 2021-Jun-02 at 17:39

            The main problem I think is the discount factor, gamma. You are setting it to 1.0, which mean that you are giving the same weight to the future rewards as the current one. Usually in reinforcement learning we care more about the immediate reward than the future, so gamma should always be less than 1.

            Just to give it a try I set gamma = 0.99 and run your code:

            Source https://stackoverflow.com/questions/67789148

            QUESTION

            Deep Q Learning - Cartpole Environment
            Asked 2021-May-31 at 22:21

            I have a concern in understanding the Cartpole code as an example for Deep Q Learning. The DQL Agent part of the code as follow:

            ...

            ANSWER

            Answered 2021-May-31 at 22:21

            self.model.predict(state) will return a tensor of shape of (1, 2) containing the estimated Q values for each action (in cartpole the action space is {0,1}). As you know the Q value is a measure of the expected reward.

            By setting self.model.predict(state)[0][action] = target (where target is the expected sum of rewards) it is creating a target Q value on which to train the model. By then calling model.fit(state, train_target) it is using the target Q value to train said model to approximate better Q values for each state.

            I don't understand why you are saying that the loss becomes 0: the target is set to the discounted sum of rewards plus the current reward

            Source https://stackoverflow.com/questions/67773479

            QUESTION

            RLLib tunes PPOTrainer but not A2CTrainer
            Asked 2021-Feb-11 at 18:29

            I am making a comparison between both kind of algorithms against the CartPole environment. Having the imports as:

            ...

            ANSWER

            Answered 2021-Feb-11 at 18:29

            The A2C code fails due to the configuration you copied from the PPO trial: "sgd_minibatch_size", "kl_coeff" and many others are PPO-specific configs, which cause the problem when running using A2C.

            The error is explained in the "error.txt" in the logdir.

            Source https://stackoverflow.com/questions/65668160

            QUESTION

            Can a tf-agents environment be defined with an unobservable exogenous state?
            Asked 2021-Jan-13 at 08:17

            I apologize in advance for the question in the title not being very clear. I'm trying to train a reinforcement learning policy using tf-agents in which there exists some unobservable stochastic variable that affects the state.

            For example, consider the standard CartPole problem, but we add wind where the velocity changes over time. I don't want to train an agent that relies on having observed the wind velocity at each step; I instead want the wind to affect the position and angular velocity of the pole, and the agent to learn to adapt just as it would in the wind-free environment. In this example however, we would need the wind velocity at the current time to be correlated with the wind velocity at the previous time e.g. we wouldn't want the wind velocity to change from 10m/s at time t to -10m/s at time t+1.

            The problem I'm trying to solve is how to track the state of the exogenous variable without making it part of the observation spec that gets fed into the neural network when training the agent. Any guidance would be appreciated.

            ...

            ANSWER

            Answered 2021-Jan-13 at 08:17

            Yes, that is no problem at all. Your environment object (a subclass of PyEnvironment or TFEnvironment) can do whatever you want within it. The observation_spec requirement is only related to the TimeStep that you output in the step and reset methods (more precisely in your implementation of the _step and _reset abstract methods).

            Your environment however is completely free to have any additional attributes that you might want (like parameters to control wind generation) and any number of additional methods you like (like methods to generate the wind at this timestep according to self._wind_hyper_params). A quick schematic of your code would look like is below:

            Source https://stackoverflow.com/questions/65694416

            QUESTION

            Problem with implementing temporal difference based on actor-critic
            Asked 2020-Dec-02 at 13:11

            I implemented a simple actor-critic model in Tensorflow==2.3.1 to learn Cartpole environment. But it is not learning at all. The average scores of every 50 episodes is below 20. Can someone please point out why the model isn't learning?

            I based my algorithm on the following pseudocode:

            ...

            ANSWER

            Answered 2020-Dec-02 at 13:11

            I was able to fix your code. Main changes:

            • replace math.log() with tfp.distributions.Categorical.log_prob()
            • change error calculation method

            But I'm not entirely sure why it works this way, so further clarification is appreciated.

            Source https://stackoverflow.com/questions/65029828

            QUESTION

            How to extend an agent class in ChainerRL in Python
            Asked 2020-Nov-25 at 11:52

            I want to extend the PPO agent class in ChainerRL. I did the following:

            ...

            ANSWER

            Answered 2020-Nov-25 at 11:50
            action = super().act_and_train(obs, reward)
            

            Source https://stackoverflow.com/questions/65003111

            QUESTION

            Understanding State Groups in Context in Drake
            Asked 2020-Nov-24 at 17:57

            I've created a diagram with an LQR Controller, a MultiBodyPlant, a scenegraph, and a PlanarSceneGraphVisualizer.

            While trying to run this simulation, I set the random initial conditions using the function: context.SetDiscreteState(randInitState). However, with this, I get the following error:

            RuntimeError: Context::SetDiscreteState(): expected exactly 1 discrete state group but there were 2 groups. Use the other signature if you have multiple groups.

            And indeed when I check the number of groups using context.num_discrete_state_groups(), it returns 2. So, then I have to specify the group index while setting the state using the command context.SetDiscreteState(0, randInitState). This works but I don't exactly know why. I understand that I have to select a correct group to set the state for but what exactly is a group here? In the cartpole example given here, the context was set using context.SetContinuousState(UprightState() + 0.1 * np.random.randn(4,)) without specifying any group(s).

            Are groups only valid for discrete systems? The context documentation talks about groups and but doesn't define them.

            Is there a place to find the definition of what a group is while setting up a drake simulation with multiple systems inside a diagram and how to check the group index of a system?

            ...

            ANSWER

            Answered 2020-Nov-24 at 17:57

            We would typically recommend that you use a workflow that sets the context using a subsystem interface. E.g.

            Source https://stackoverflow.com/questions/64987916

            QUESTION

            Recording multiple meshcat visualizers
            Asked 2020-Nov-11 at 02:41

            I would like to record multiple meshcat visualizers with different prefix ids so I can eventually save them to an HTML file that I can interact with later. I would like the different visualizations to play on top of each other and to be able to select/unselect different visualizations.

            Here is a minimal replication of the problem. I expect there to be two cartpole visualizations that are on top of each other. However, I only end up seeing one of them being recorded. The other cartpole seems stuck at the end position when I play back the recording

            ...

            ANSWER

            Answered 2020-Nov-11 at 02:41

            Here is one solution. I don't love it, but it accomplishes the stated goal. Note the changes:

            1. saving and publishing only one animation
            2. naming the model in the MultibodyPlant with a distinct name for each sim
            3. setting delete_prefix_on_load=False so it doesn't clear the old geometry

            Source https://stackoverflow.com/questions/64757012

            QUESTION

            Pytorch DQN, DDQN using .detach() caused very wield loss (increases exponentially) and do not learn at all
            Asked 2020-Nov-09 at 16:36

            Here is my implementation of DQN and DDQN for CartPole-v0 which I think is correct.

            ...

            ANSWER

            Answered 2020-Nov-09 at 16:36

            so one mistake in your implementation is that you never add the end of an episode to your replay buffer. In your train function you return if sign==1 (end of the episode). Remove that return and adjust the target calculation via (1-dones)*... in case you sample a transition of the end of an episode. The reason why the end of the episode is important is that it is the only experience is where the target is not approximated via bootstrapping. Then DQN trains. For reproducibility I used a discount rate of 0.99 and the seed 2020 (for torch, numpy and the gym environment). I achieved a reward of 199.100 after 241 episodes of training.

            Hope that helps, code is very readable btw.

            Source https://stackoverflow.com/questions/64690471

            QUESTION

            What Loss Or Reward Is Backpropagated In Policy Gradients For Reinforcement Learning?
            Asked 2020-Oct-24 at 15:29

            I have made a small script in Python to solve various Gym environments with policy gradients.

            ...

            ANSWER

            Answered 2020-Sep-09 at 06:58

            The loss here depends on what output on each problem. Generaly, loss for backpropagate should be a number that represents for everything you have processed. For policy gradient, it will be the reward that it think it will get compare with the original reward, the log is just a way to bring it back to a probabily random variable. Single dimension. If you want to inspect the behavior behind codes, you should always check the shape/dimension between each process to fully understand

            Source https://stackoverflow.com/questions/63602222

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install cartpole

            You can download it from GitHub.
            You can use cartpole like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/gsurma/cartpole.git

          • CLI

            gh repo clone gsurma/cartpole

          • sshUrl

            git@github.com:gsurma/cartpole.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link