DDPG | Continuous Control with Deep Reinforcement Learning | Reinforcement Learning library

by floodsung Python Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | DDPG Summary

DDPG is a Python library typically used in Artificial Intelligence, Reinforcement Learning, Deep Learning, Tensorflow applications. DDPG has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. However DDPG build file is not available. You can download it from GitHub.

Reimplementing DDPG from Continuous Control with Deep Reinforcement Learning based on OpenAI Gym and Tensorflow. It is still a problem to implement Batch Normalization on the critic network. However the actor network works well with Batch Normalization. Some Mujoco environments are still unsolved on OpenAI Gym.

Support

Quality

Security

License

Reuse

Support

DDPG has a low active ecosystem.

It has 430 star(s) with 153 fork(s). There are 14 watchers for this library.

It had no major release in the last 6 months.

There are 13 open issues and 5 have been closed. On average issues are closed in 99 days. There are 2 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of DDPG is current.

Quality

DDPG has 0 bugs and 0 code smells.

Security

DDPG has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

DDPG code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

DDPG is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

DDPG releases are not available. You will need to build from source code and install.

DDPG has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions are not available. Examples and code snippets are available.

DDPG saves you 240 person hours of effort in developing the same functionality from scratch.

It has 585 lines of code, 64 functions and 9 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed DDPG and discovered the below as its top functions. This is intended to give you an instant insight into DDPG implemented functionality, and help decide if they suit your requirements.

Create Q network
Batch norm layer
Return a tf Variable
Creates the network
Batch normalization layer
Returns a tf Variable
Creates the target network
Evaluate a given transition
Train the model
Add an experience
Return the number of experience
Return a batch of data from the buffer
Resets the state
Creates a target q - norm
Calculate noise
Compute the action based on the current policy
Generate the noise
Returns the action of the actor

Get all kandi verified functions for this library.

DDPG Key Features

No Key Features are available at this moment for DDPG.

DDPG Examples and Code Snippets

No Code Snippets are available at this moment for DDPG.

Community Discussions

Trending Discussions on DDPG

Action masking for continuous action space in reinforcement learning

Using BatchedPyEnvironment in tf_agents

FailedPreconditionError while using DDPG RL algorithm, in python, with keras, keras-rl2

TensorFlow 2.0 : ValueError - No Gradients Provided (After Modifying DDPG Actor)

Policy network of PPO in Rllib

Deep reinforcement learning with multiple "continuous actions"

Calling .backward() function for two different neural networks but getting retain_graph=True error

How to guarantee that the actor would select a correct action?

net.zero_grad() vs optim.zero_grad() pytorch

How can I save DDPG model?

QUESTION

Action masking for continuous action space in reinforcement learning

Asked 2022-Mar-17 at 08:28

Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems often have continuous action and state spaces. In addition, the state often influences what actions are possible and, thus, the allowed actions change from step to step.

Simple example:

The agent has a wealth (continuous state) and decides about spending (continuous action). The next periods is then wealth minus spending. But he is restricted by the budget constraint. He is not allowed to spend more than his wealth. What is the best way to model this?

What I tried: For discrete actions it is possible to use action masking. So in each time step, I provided the agent with information which action is allowed and which not. I also tried to do it with contiuous action space by providing lower and upper bound on allowed actions and clip the actions smapled from actor network (e.g. DDPG).

I am wondering if this is a valid thing to do (it works in a simple toy model) because I did not find any RL library that implements this. Or is there a smarter way/best practice to include the information about allowed actions to the agent?

...

ANSWER

Answered 2022-Mar-17 at 08:28

I think you are on the right track. I've looked into masked actions and found two possible approaches: give a negative reward when trying to take an invalid action (without letting the environment evolve), or dive deeper into the neural network code and let the neural network output only valid actions. I've always considered this last approach as the most efficient, and your approach of introducing boundaries seems very similar to it. So as long as this is the type of mask (boundaries) you are looking for, I think you are good to go.

Source https://stackoverflow.com/questions/71437246

QUESTION

Using BatchedPyEnvironment in tf_agents

Asked 2022-Feb-19 at 18:11

I am trying to create a batched environment version of an SAC agent example from the Tensorflow Agents library, the original code can be found here. I am also using a custom environment.

I am pursuing a batched environment setup in order to better leverage GPU resources in order to speed up training. My understanding is that by passing batches of trajectories to the GPU, there will be less overhead incurred when passing data from the host (CPU) to the device (GPU).

My custom environment is called SacEnv, and I attempt to create a batched environment like so:

...

ANSWER

Answered 2022-Feb-19 at 18:11

It turns out I neglected to pass batch_size when initializing the AverageReturnMetric and AverageEpisodeLengthMetric instances.

Source https://stackoverflow.com/questions/71168412

QUESTION

FailedPreconditionError while using DDPG RL algorithm, in python, with keras, keras-rl2

Asked 2021-Jun-10 at 07:00

I am training a DDPG agent on my custom environment that I wrote using openai gym. I am getting error during training the model.

When I search for a solution on web, I found that some people who faced similar issue were able to resolve it by initializing the variable.

...

ANSWER

Answered 2021-Jun-10 at 07:00

For now I was able to solve this error by replacing the imports from keras with imports from tensorflow.keras, although I don't know why keras itseld doesn't work

Source https://stackoverflow.com/questions/67908668

QUESTION

TensorFlow 2.0 : ValueError - No Gradients Provided (After Modifying DDPG Actor)

Asked 2021-Jun-05 at 19:06

Background

I'm currently trying to implement a DDPG framework to control a simple car agent. At first, the car agent would only need to learn how to reach the end of a straight path as quickly as possible by adjusting its acceleration. This task was simple enough, so I decided to introduce an additional steering action as well. I updated my observation and action spaces accordingly.

The lines below are the for loop that runs each episode:

...

ANSWER

Answered 2021-Jun-05 at 19:06

The issue has been resolved thanks to some simple but helpful advice I received on Reddit. I was disrupting the tracking of my variables by making changes using my custom for-loop. I should have used a TensorFlow function instead. The following changes fixed the problem for me:

Source https://stackoverflow.com/questions/67845026

QUESTION

Policy network of PPO in Rllib

Asked 2021-May-28 at 09:59

I want to set "actor_hiddens" a.k.a the hidden layers of the policy network of PPO in Rllib, and be able to set their weights. Is this possible? If yes please tell me how? I know how to do it for DDPG in Rllib, but the problem with PPO is that I can't find the policy network. Thanks.

...

ANSWER

Answered 2021-May-28 at 09:59

You can always create your own/custom policy network then you have full control over the layers and also the initialization of the weights.

If you want to use the default model you have the following params to adapt it to your needs:

Source https://stackoverflow.com/questions/65653439

QUESTION

Deep reinforcement learning with multiple "continuous actions"

Asked 2021-Mar-02 at 07:15

Below is a high level diagram of how my Agent should look like in order to be able to interact with a custom gym environment I made.

States and actions

The environment has three states [s1, s2, s3] and six actions [a1, a2, a3, a4, a5, a6] states and actions can be any value between 0 and 1

Question:

Which algorithms are suitable for my problem ? I am aware that there are algorithms that are good at handling continuous action space like (DDPG, PPO, etc.) but I can't see how they might operate when they should output multiple actions at each time-step. Finally, are there any gym environments that have the described property (multiple actions) and are there any python implementations for solving those particular environments?

...

ANSWER

Answered 2021-Mar-02 at 04:01

As you mentioned in your question, PPO, DDPG, TRPO, SAC, etc. are indeed suitable for handling continuous action spaces for reinforcement learning problems. These algorithms will give out a vector of size equal to your action dimension and each element in this vector will be a real number instead of a discrete value. Note that stochastic algorithms like PPO will give a multivariate probability distribution from which you sample the actions.

Most of the robotic environments in Mujoco-py, PyBullet, Robosuite, etc. are environment with multiple continuous action spaces. Here the action spaces can be of the form [torque_for_joint_1, torque_for_join_2, ..., torque_for_joint_n] where torque_for_joint_i can be a real valued number determining by how much would that joint move.

Regarding implementations for solving these environments, robosuite does offer sample solutions for benchmarking the environments with different algorithms. You could also look up stable-baselines or one of the standard RL libraries.

Source https://stackoverflow.com/questions/66418231

QUESTION

Calling .backward() function for two different neural networks but getting retain_graph=True error

Asked 2021-Jan-20 at 20:00

I have an Actor Critic neural network where the Actor is its own class and the Critic is its own class with its own neural network and .forward() function. I then am creating an object of each of these classes in a larger Model class. My setup is as follows:

...

ANSWER

Answered 2021-Jan-20 at 19:09

Yes, you shouldn't do it like that. What you should do instead, is propagating through parts of the graph.

What the graph contains

Now, graph contains both actor and critic. If the computations pass through the same part of graph (say, twice through actor), it will raise this error.

And they will, as you clearly use actor and critic joined with loss value (this line: loss_actor = -self.critic(state, action))
Different optimizers do not change anything here, as it's backward problem (optimizers simply apply calculated gradients onto models)

Trying to fix it

This is how to fix it in GANs, but not in this case, see Actual fix paragraph below, read on if you are curious about the topic

If part of a neural network (critic in this case) does not take part in the current optimization step, it should be treated as a constant (and vice versa).

To do that, you could disable gradient using torch.no_grad context manager (documentation) and set critic to eval mode (documentation), something along those lines:

Source https://stackoverflow.com/questions/65815598

QUESTION

How to guarantee that the actor would select a correct action?

Asked 2021-Jan-04 at 02:09

In the training phase of Deep Deterministic Policy Gradient (DDPG) algorithm, the action selection would be simply

...

ANSWER

Answered 2021-Jan-04 at 02:09

The actor usually is a neural network, and the reason of actor's action restrict in [-1,1] is usually because the output layer of the actor net using activation function like Tanh, and one can process this outputs to let action belong to any range.

The reason of actor can choose the good action depending on environment, is because in MDP(Markov decision process), the actor doing trial and error in the environment, and get reward or penalty for actor doing good or bad, i.e the actor net get gradients towards better action.

Note algorithms like PPG, PPO, SAC, DDPG, can guarantee the actor would select the best action for all states in theory! (i.e assume infinite learning time, infinite actor net capacity, etc.) in practice, there usually no guarantee unless action space is discrete and environment is very simple.

Understand the idea behind RL algorithms will greatly help you understand source codes of those algorithms, after all, code is implementation of the idea.

Source https://stackoverflow.com/questions/65556692

QUESTION

net.zero_grad() vs optim.zero_grad() pytorch

Asked 2020-May-19 at 22:05

Here they mention the need to include optim.zero_grad() when training to zero the parameter gradients. My question is: Could I do as well net.zero_grad() and would that have the same effect? Or is it necessary to do optim.zero_grad(). Moreover, what happens if I do both? If I do none, then the gradients get accumulated, but what does that exactly mean? do they get added? In other words, what's the difference between doing optim.zero_grad() and net.zero_grad(). I am asking because here, line 115 they use net.zero_grad() and it is the first time I see that, that is an implementation of a reinforcement learning algorithm, where one has to be especially careful with the gradients because there are multiple networks and gradients, so I suppose there is a reason for them to do net.zero_grad() as opposed to optim.zero_grad().

...

ANSWER

Answered 2020-May-19 at 22:05

net.zero_grad() sets the gradients of all its parameters (including parameters of submodules) to zero. If you call optim.zero_grad() that will do the same, but for all parameters that have been specified to be optimised. If you are using only net.parameters() in your optimiser, e.g. optim = Adam(net.parameters(), lr=1e-3), then both are equivalent, since they contain the exact same parameters.

You could have other parameters that are being optimised by the same optimiser, which are not part of net, in which case you would either have to manually set their gradients to zero and therefore keep track of all the parameters, or you can simply call optim.zero_grad() to ensure that all parameters that are being optimised, had their gradients set to zero.

Moreover, what happens if I do both?

Nothing, the gradients would just be set to zero again, but since they were already zero, it makes absolutely no difference.

If I do none, then the gradients get accumulated, but what does that exactly mean? do they get added?

Yes, they are being added to the existing gradients. In the backward pass the gradients in respect to every parameter are calculated, and then the gradient is added to the parameters' gradient (param.grad). That allows you to have multiple backward passes, that affect the same parameters, which would not be possible if the gradients were overwritten instead of being added.

For example, you could accumulate the gradients over multiple batches, if you need bigger batches for training stability but don't have enough memory to increase the batch size. This is trivial to achieve in PyTorch, which is essentially leaving off optim.zero_grad() and delaying optim.step() until you have gathered enough steps, as shown in HuggingFace - Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups.

That flexibility comes at the cost of having to manually set the gradients to zero. Frankly, one line is a very small cost to pay, even though many users won't make use of it and especially beginners might find it confusing.

Source https://stackoverflow.com/questions/61898668

QUESTION

How can I save DDPG model?

Asked 2020-May-05 at 16:54

I try to save the model using the saver method (I use the save function in the DDPG class to save), but when restoring the model, the result is far from the one I saved (I save the model when the episodic award is zero, the restor method in the code is commented out ) My code is below with all the features. I use Python 3.7, gym 0.16.0 and TensorFlow version 1.13.1

...

ANSWER

Answered 2020-May-05 at 16:54

I solved this problem completely by rewriting the code and adding the learning function in a separate session

Source https://stackoverflow.com/questions/61149054

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install DDPG

You can download it from GitHub.
You can use DDPG like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: