pytorch-a2c-ppo-acktr-gail | PyTorch implementation of Advantage Actor Critic | Reinforcement Learning library

 by   ikostrikov Python Version: Current License: MIT

kandi X-RAY | pytorch-a2c-ppo-acktr-gail Summary

kandi X-RAY | pytorch-a2c-ppo-acktr-gail Summary

pytorch-a2c-ppo-acktr-gail is a Python library typically used in Telecommunications, Media, Media, Entertainment, Artificial Intelligence, Reinforcement Learning, Deep Learning, Pytorch applications. pytorch-a2c-ppo-acktr-gail has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

            kandi-support Support

              pytorch-a2c-ppo-acktr-gail has a medium active ecosystem.
              It has 3217 star(s) with 809 fork(s). There are 67 watchers for this library.
              It had no major release in the last 6 months.
              There are 83 open issues and 146 have been closed. On average issues are closed in 127 days. There are 5 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of pytorch-a2c-ppo-acktr-gail is current.

            kandi-Quality Quality

              pytorch-a2c-ppo-acktr-gail has 0 bugs and 0 code smells.

            kandi-Security Security

              pytorch-a2c-ppo-acktr-gail has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              pytorch-a2c-ppo-acktr-gail code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              pytorch-a2c-ppo-acktr-gail is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              pytorch-a2c-ppo-acktr-gail releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              pytorch-a2c-ppo-acktr-gail saves you 676 person hours of effort in developing the same functionality from scratch.
              It has 1573 lines of code, 95 functions and 18 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed pytorch-a2c-ppo-acktr-gail and discovered the below as its top functions. This is intended to give you an instant insight into pytorch-a2c-ppo-acktr-gail implemented functionality, and help decide if they suit your requirements.
            • Updates the payoff function
            • Generate the recurrent generator
            • Returns a feed - forward generator
            • Evaluate the given action
            • Evaluate an actor
            • Create a VecNetEnvEnv
            • Create an environment
            • Compute the action
            • Calculate gradients for a given module
            • Compute the covariance matrix
            • Update the running stat
            • Predict reward function
            • Update expert loss function
            • Serialize the model to the given device
            • Saves input tensor
            • Extract patches from x
            • Get the render function for a given venv
            • Perform the forward computation
            • Get venv normalize
            • Saves the model to the given device
            • Get argument parser
            • Compute the critic
            • Inserts the given observation
            • Create vectors for VecPy
            • Compute returns for the given reward
            • Update the objective function
            • Copies the observation and masks
            Get all kandi verified functions for this library.

            pytorch-a2c-ppo-acktr-gail Key Features

            No Key Features are available at this moment for pytorch-a2c-ppo-acktr-gail.

            pytorch-a2c-ppo-acktr-gail Examples and Code Snippets

            OPEN AI BASELINES,Benchmark Results,A2C
            Pythondot img1Lines of Code : 38dot img1License : Permissive (MIT)
            copy iconCopy
                #            DEFINE YOUR "BASELINES" PARAMETERS HERE 
                train_env_id =  'merge-v0'
                play_env_id = ''
                alg = 'a2c  
            PPO-BiHyb,Environment set up
            Pythondot img2Lines of Code : 19dot img2no licencesLicense : No License
            copy iconCopy
            export TORCH=1.7.0
            export CUDA=cu101
            pip install torch==1.7.1+${CUDA} torchvision==0.8.2+${CUDA} torchaudio===0.7.2 -f
            pip install --no-index --upgrade torch-scatter -f  
            6. Algorithms,6.1. A2C
            Pythondot img3Lines of Code : 19dot img3License : Permissive (MIT)
            copy iconCopy
             xagents train a2c --env PongNoFrameskip-v4 --target-reward 19 --n-envs 16 --preprocess --checkpoints
            xagents train a2c --env BipedalWalker-v3 --target-reward 100 --n-envs 16 --checkpoints
            from tensorflow.keras.opt  
            tianshou - atari ppo
            Pythondot img4Lines of Code : 254dot img4License : Permissive (MIT License)
            copy iconCopy
            import argparse
            import datetime
            import os
            import pprint
            import numpy as np
            import torch
            from atari_network import DQN, layer_init, scale_obs
            from atari_wrapper import make_atari_env
            from torch.optim.lr_scheduler import LambdaLR
            from torch.utils.tens  
            tianshou - vizdoom ppo
            Pythondot img5Lines of Code : 246dot img5License : Permissive (MIT License)
            copy iconCopy
            import argparse
            import datetime
            import os
            import pprint
            import numpy as np
            import torch
            from env import make_vizdoom_env
            from network import DQN
            from torch.optim.lr_scheduler import LambdaLR
            from torch.utils.tensorboard import SummaryWriter
            from ti  
            tianshou - irl gail
            Pythondot img6Lines of Code : 225dot img6License : Permissive (MIT License)
            copy iconCopy
            #!/usr/bin/env python3
            import argparse
            import datetime
            import os
            import pprint
            import d4rl
            import gym
            import numpy as np
            import torch
            from torch import nn
            from torch.distributions import Independent, Normal
            from torch.optim.lr_scheduler import Lamb  

            Community Discussions


            Keras: AttributeError: 'Adam' object has no attribute '_name'
            Asked 2022-Apr-16 at 15:05

            I want to compile my DQN Agent but I get error: AttributeError: 'Adam' object has no attribute '_name',



            Answered 2022-Apr-16 at 15:05

            Your error came from importing Adam with from keras.optimizer_v1 import Adam, You can solve your problem with tf.keras.optimizers.Adam from TensorFlow >= v2 like below:

            (The lr argument is deprecated, it's better to use learning_rate instead.)



            What are vectorized environments in reinforcement learning?
            Asked 2022-Mar-25 at 10:37

            I'm having a hard time wrapping my head around what and when vectorized environments should be used. If you can provide an example of a use case, that would be great.

            Documentation of vectorized environments in SB3:



            Answered 2022-Mar-25 at 10:37

            Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of executing and training an agent on 1 environment per step, it allows to train the agent on multiple environments per step.

            Usually you also want these environment to have different seeds, in order to gain more diverse experience. This is very useful to speed up training.

            I think they are called "vectorized" since each training step the agent observes multiple states (inserted in a vector), outputs multiple actions (one for each environment), which are inserted in a vector, and receives multiple rewards. Hence the "vectorized" term



            How does a gradient backpropagates through random samples?
            Asked 2022-Mar-25 at 03:06

            I'm learning about policy gradients and I'm having hard time understanding how does the gradient passes through a random operation. From here: It is not possible to directly backpropagate through random samples. However, there are two main methods for creating surrogate functions that can be backpropagated through.

            They have an example of the score function:



            Answered 2021-Nov-30 at 05:48

            It is indeed true that sampling is not a differentiable operation per se. However, there exist two (broad) ways to mitigate this - [1] The REINFORCE way and [2] The reparameterization way. Since your example is related to [1], I will stick my answer to REINFORCE.

            What REINFORCE does is it entirely gets rid of sampling operation in the computation graph. However, the sampling operation remains outside the graph. So, your statement

            .. how does the gradient passes through a random operation ..

            isn't correct. It does not pass through any random operation. Let's see your example



            Relationship of Horizon and Discount factor in Reinforcement Learning
            Asked 2022-Mar-13 at 17:50

            What is the connection between discount factor gamma and horizon in RL.

            What I have learned so far is that the horizon is the agent`s time to live. Intuitively, agents with finite horizon will choose actions differently than if it has to live forever. In the latter case, the agent will try to maximize all the expected rewards it may get far in the future.

            But the idea of the discount factor is also the same. Are the values of gamma near zero makes the horizon finite?



            Answered 2022-Mar-13 at 17:50

            Horizon refers to how many steps into the future the agent cares about the reward it can receive, which is a little different from the agent's time to live. In general, you could potentially define any arbitrary horizon you want as the objective. You could define a 10 step horizon, in which the agent makes a decision that will enable it to maximize the reward it will receive in the next 10 time steps. Or we could choose a 100, or 1000, or n step horizon!

            Usually, the n-step horizon is defined using n = 1 / (1-gamma). Therefore, 10 step horizon will be achieved using gamma = 0.9, while 100 step horizon can be achieved with gamma = 0.99

            Therefore, any value of gamma less than 1 imply that the horizon is finite.



            OpenAI-Gym and Keras-RL: DQN expects a model that has one dimension for each action
            Asked 2022-Mar-02 at 10:55

            I am trying to set a Deep-Q-Learning agent with a custom environment in OpenAI Gym. I have 4 continuous state variables with individual limits and 3 integer action variables with individual limits.

            Here is the code:



            Answered 2021-Dec-23 at 11:19

            As we talked about in the comments, it seems that the Keras-rl library is no longer supported (the last update in the repository was in 2019), so it's possible that everything is inside Keras now. I take a look at Keras documentation and there are no high-level functions to build a reinforcement learning model, but is possible to use lower-level functions to this.

            • Here is an example of how to use Deep Q-Learning with Keras: link

            Another solution may be to downgrade to Tensorflow 1.0 as it seems the compatibility problem occurs due to some changes in version 2.0. I didn't test, but maybe the Keras-rl + Tensorflow 1.0 may work.

            There is also a branch of Keras-rl to support Tensorflow 2.0, the repository is archived, but there is a chance that it will work for you



            gym package not identifying ten-armed-bandits-v0 env
            Asked 2022-Feb-08 at 08:01


            • Python: 3.9
            • OS: Windows 10

            When I try to create the ten armed bandits environment using the following code the error is thrown not sure of the reason.



            Answered 2022-Feb-08 at 08:01

            It could be a problem with your Python version: k-armed-bandits library was made 4 years ago, when Python 3.9 didn't exist. Besides this, the configuration files in the repo indicates that the Python version is 2.7 (not 3.9).

            If you create an environment with Python 2.7 and follow the setup instructions it works correctly on Windows:



            ValueError: Input 0 of layer "max_pooling2d" is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: (None, 3, 51, 39, 32)
            Asked 2022-Feb-01 at 07:31

            I have two different problems occurs at the same time.

            I am having dimensionality problems with MaxPooling2d and having same dimensionality problem with DQNAgent.

            The thing is, I can fix them seperately but cannot at the same time.

            First Problem

            I am trying to build a CNN network with several layers. After I build my model, when I try to run it, it gives me an error.



            Answered 2022-Feb-01 at 07:31

            Issue is with input_shape. input_shape=input_shape[1:]

            Working sample code



            Stablebaselines3 logging reward with custom gym
            Asked 2021-Dec-25 at 01:10

            I have this custom callback to log the reward in my custom vectorized environment, but the reward appears in console as always [0] and is not logged in tensorboard at all



            Answered 2021-Dec-25 at 01:10

            You need to add [0] as indexing,

            so where you wrote self.logger.record('reward', self.training_env.get_attr('total_reward')) you just need to index with self.logger.record('reward', self.training_env.get_attr ('total_reward')[0])



            What is the purpose of [np.arange(0, self.batch_size), action] after the neural network?
            Asked 2021-Dec-23 at 11:07

            I followed a PyTorch tutorial to learn reinforcement learning(TRAIN A MARIO-PLAYING RL AGENT) but I am confused about the following code:



            Answered 2021-Dec-23 at 11:07

            Essentially, what happens here is that the output of the net is being sliced to get the desired part of the Q table.

            The (somewhat confusing) index of [np.arange(0, self.batch_size), action] indexes each axis. So, for axis with index 1, we pick the item indicated by action. For index 0, we pick all items between 0 and self.batch_size.

            If self.batch_size is the same as the length of dimension 0 of this array, then this slice can be simplified to [:, action] which is probably more familiar to most users.



            DQN predicts same action value for every state (cart pole)
            Asked 2021-Dec-22 at 15:55

            I'm trying to implement a DQN. As a warm up I want to solve CartPole-v0 with a MLP consisting of two hidden layers along with input and output layers. The input is a 4 element array [cart position, cart velocity, pole angle, pole angular velocity] and output is an action value for each action (left or right). I am not exactly implementing a DQN from the "Playing Atari with DRL" paper (no frame stacking for inputs etc). I also made a few non standard choices like putting done and the target network prediction of action value in the experience replay, but those choices shouldn't affect learning.

            In any case I'm having a lot of trouble getting the thing to work. No matter how long I train the agent it keeps predicting a higher value for one action over another, for example Q(s, Right)> Q(s, Left) for all states s. Below is my learning code, my network definition, and some results I get from training



            Answered 2021-Dec-19 at 16:09

            There was nothing wrong with the network definition. It turns out the learning rate was too high and reducing it 0.00025 (as in the original Nature paper introducing the DQN) led to an agent which can solve CartPole-v0.

            That said, the learning algorithm was incorrect. In particular I was using the wrong target action-value predictions. Note the algorithm laid out above does not use the most recent version of the target network to make predictions. This leads to poor results as training progresses because the agent is learning based on stale target data. The way to fix this is to just put (s, a, r, s', done) into the replay memory and then make target predictions using the most up to date version of the target network when sampling a mini batch. See the code below for an updated learning loop.


            Community Discussions, Code Snippets contain sources that include Stack Exchange Network


            No vulnerabilities reported

            Install pytorch-a2c-ppo-acktr-gail

            You can download it from GitHub.
            You can use pytorch-a2c-ppo-acktr-gail like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.


            I highly recommend PyBullet as a free open source alternative to MuJoCo for continuous control tasks. All environments are operated using exactly the same Gym interface. See their documentations for a comprehensive list. To use the DeepMind Control Suite environments, set the flag --env-name dm.<domain_name>.<task_name>, where domain_name and task_name are the name of a domain (e.g. hopper) and a task within that domain (e.g. stand) from the DeepMind Control Suite. Refer to their repo and their tech report for a full list of available domains and tasks. Other than setting the task, the API for interacting with the environment is exactly the same as for all the Gym environments thanks to dm_control2gym.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
          • HTTPS


          • CLI

            gh repo clone ikostrikov/pytorch-a2c-ppo-acktr-gail

          • sshUrl


          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Reinforcement Learning Libraries

            Try Top Libraries by ikostrikov


            by ikostrikovPython


            by ikostrikovPython


            by ikostrikovPython


            by ikostrikovJupyter Notebook


            by ikostrikovPython