ddpg | Implements the DDPG model from Lillicrap et al | Machine Learning library
kandi X-RAY | ddpg Summary
kandi X-RAY | ddpg Summary
Implements the DDPG model from Lillicrap et al.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Create the train op .
- Run episode loop .
- Builds the critic .
- Return a random batch of observations .
- Build the actor .
- Initialize the simulation .
- Add an observation to the model .
- Update the optimizer .
ddpg Key Features
ddpg Examples and Code Snippets
Community Discussions
Trending Discussions on ddpg
QUESTION
I am training a DDPG agent on my custom environment that I wrote using openai gym. I am getting error during training the model.
When I search for a solution on web, I found that some people who faced similar issue were able to resolve it by initializing the variable.
...ANSWER
Answered 2021-Jun-10 at 07:00For now I was able to solve this error by replacing the imports from keras with imports from tensorflow.keras, although I don't know why keras itseld doesn't work
QUESTION
Background
I'm currently trying to implement a DDPG framework to control a simple car agent. At first, the car agent would only need to learn how to reach the end of a straight path as quickly as possible by adjusting its acceleration. This task was simple enough, so I decided to introduce an additional steering action as well. I updated my observation and action spaces accordingly.
The lines below are the for loop that runs each episode:
...ANSWER
Answered 2021-Jun-05 at 19:06The issue has been resolved thanks to some simple but helpful advice I received on Reddit. I was disrupting the tracking of my variables by making changes using my custom for-loop. I should have used a TensorFlow function instead. The following changes fixed the problem for me:
QUESTION
I want to set "actor_hiddens" a.k.a the hidden layers of the policy network of PPO in Rllib, and be able to set their weights. Is this possible? If yes please tell me how? I know how to do it for DDPG in Rllib, but the problem with PPO is that I can't find the policy network. Thanks.
...ANSWER
Answered 2021-May-28 at 09:59You can always create your own/custom policy network then you have full control over the layers and also the initialization of the weights.
If you want to use the default model you have the following params to adapt it to your needs:
QUESTION
Below is a high level diagram of how my Agent should look like in order to be able to interact with a custom gym environment I made.
States and actionsThe environment has three states [s1, s2, s3] and six actions [a1, a2, a3, a4, a5, a6] states and actions can be any value between 0 and 1
Question:Which algorithms are suitable for my problem ? I am aware that there are algorithms that are good at handling continuous action space like (DDPG, PPO, etc.) but I can't see how they might operate when they should output multiple actions at each time-step. Finally, are there any gym environments that have the described property (multiple actions) and are there any python implementations for solving those particular environments?
...ANSWER
Answered 2021-Mar-02 at 04:01As you mentioned in your question, PPO, DDPG, TRPO, SAC, etc. are indeed suitable for handling continuous action spaces for reinforcement learning problems. These algorithms will give out a vector of size equal to your action dimension and each element in this vector will be a real number instead of a discrete value. Note that stochastic algorithms like PPO will give a multivariate probability distribution from which you sample the actions.
Most of the robotic environments in Mujoco-py, PyBullet, Robosuite, etc. are environment with multiple continuous action spaces. Here the action spaces can be of the form [torque_for_joint_1, torque_for_join_2, ..., torque_for_joint_n]
where torque_for_joint_i can be a real valued number determining by how much would that joint move.
Regarding implementations for solving these environments, robosuite does offer sample solutions for benchmarking the environments with different algorithms. You could also look up stable-baselines or one of the standard RL libraries.
QUESTION
I have an Actor Critic neural network where the Actor is its own class and the Critic is its own class with its own neural network and .forward() function. I then am creating an object of each of these classes in a larger Model class. My setup is as follows:
...ANSWER
Answered 2021-Jan-20 at 19:09Yes, you shouldn't do it like that. What you should do instead, is propagating through parts of the graph.
What the graph containsNow, graph contains both actor
and critic
. If the computations pass through the same part of graph (say, twice through actor), it will raise this error.
And they will, as you clearly use
actor
andcritic
joined with loss value (this line:loss_actor = -self.critic(state, action)
)Different optimizers do not change anything here, as it's
backward
problem (optimizers simply apply calculated gradients onto models)
- This is how to fix it in GANs, but not in this case, see
Actual fix
paragraph below, read on if you are curious about the topic
If part of a neural network (critic
in this case) does not take part in the current optimization step, it should be treated as a constant (and vice versa).
To do that, you could disable gradient
using torch.no_grad
context manager (documentation) and set critic
to eval
mode (documentation), something along those lines:
QUESTION
In the training phase of Deep Deterministic Policy Gradient (DDPG) algorithm, the action selection would be simply
...ANSWER
Answered 2021-Jan-04 at 02:09The actor
usually is a neural network, and the reason of actor's action
restrict in [-1,1] is usually because the output layer of the actor
net using activation function like Tanh
, and one can process this outputs to let action
belong to any range.
The reason of actor
can choose the good action depending on environment, is because in MDP(Markov decision process), the actor
doing trial and error in the environment, and get reward or penalty for actor
doing good or bad, i.e the actor
net get gradients towards better action
.
Note algorithms like PPG, PPO, SAC, DDPG, can guarantee the actor would select the best action for all states in theory! (i.e assume infinite learning time, infinite actor
net capacity, etc.) in practice, there usually no guarantee unless action space is discrete and environment is very simple.
Understand the idea behind RL algorithms will greatly help you understand source codes of those algorithms, after all, code is implementation of the idea.
QUESTION
Here they mention the need to include optim.zero_grad()
when training to zero the parameter gradients. My question is: Could I do as well net.zero_grad()
and would that have the same effect? Or is it necessary to do optim.zero_grad()
. Moreover, what happens if I do both? If I do none, then the gradients get accumulated, but what does that exactly mean? do they get added? In other words, what's the difference between doing optim.zero_grad()
and net.zero_grad()
. I am asking because here, line 115 they use net.zero_grad()
and it is the first time I see that, that is an implementation of a reinforcement learning algorithm, where one has to be especially careful with the gradients because there are multiple networks and gradients, so I suppose there is a reason for them to do net.zero_grad()
as opposed to optim.zero_grad()
.
ANSWER
Answered 2020-May-19 at 22:05net.zero_grad()
sets the gradients of all its parameters (including parameters of submodules) to zero. If you call optim.zero_grad()
that will do the same, but for all parameters that have been specified to be optimised. If you are using only net.parameters()
in your optimiser, e.g. optim = Adam(net.parameters(), lr=1e-3)
, then both are equivalent, since they contain the exact same parameters.
You could have other parameters that are being optimised by the same optimiser, which are not part of net
, in which case you would either have to manually set their gradients to zero and therefore keep track of all the parameters, or you can simply call optim.zero_grad()
to ensure that all parameters that are being optimised, had their gradients set to zero.
Moreover, what happens if I do both?
Nothing, the gradients would just be set to zero again, but since they were already zero, it makes absolutely no difference.
If I do none, then the gradients get accumulated, but what does that exactly mean? do they get added?
Yes, they are being added to the existing gradients. In the backward pass the gradients in respect to every parameter are calculated, and then the gradient is added to the parameters' gradient (param.grad
). That allows you to have multiple backward passes, that affect the same parameters, which would not be possible if the gradients were overwritten instead of being added.
For example, you could accumulate the gradients over multiple batches, if you need bigger batches for training stability but don't have enough memory to increase the batch size. This is trivial to achieve in PyTorch, which is essentially leaving off optim.zero_grad()
and delaying optim.step()
until you have gathered enough steps, as shown in HuggingFace - Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups.
That flexibility comes at the cost of having to manually set the gradients to zero. Frankly, one line is a very small cost to pay, even though many users won't make use of it and especially beginners might find it confusing.
QUESTION
I try to save the model using the saver method (I use the save function in the DDPG class to save), but when restoring the model, the result is far from the one I saved (I save the model when the episodic award is zero, the restor method in the code is commented out ) My code is below with all the features. I use Python 3.7, gym 0.16.0 and TensorFlow version 1.13.1
...ANSWER
Answered 2020-May-05 at 16:54I solved this problem completely by rewriting the code and adding the learning function in a separate session
QUESTION
Im an trying to use huskarl and load the demo files to tesnt that I have installed everything correctly. However when I run any of the demo files I am met with this Trace:
...ANSWER
Answered 2020-Feb-10 at 17:19So I figured out that huskarl is only compatible with tensorflow==2.0.0a0. I found that out by uninstall tf and reinstalling it and catching an error. :/
QUESTION
Running a DDPG reinforcement learner on tensorflow 2.0. The training is pretty slow for the batch size I'm using, so I'm looking to run the training in a separate thread than the execution.
However, when trying to train the Tensorflow model on a separate thread than I use it to predict and execute I run into problems.
First there's the graph error:
ValueError: Tensor("dense_2/kernel/Read/ReadVariableOp:0", shape=(3, 9), dtype=float32) must be from the same graph as Tensor("Const:0", shape=(32, 3), dtype=float32).
This one is pretty easy to resolve by restoring the graph from the primary thread on the training thread:
...ANSWER
Answered 2019-Dec-15 at 07:07Eventually I got this working. I was running tensorflow 1.4 when I thought I was using tensorflow 2 and thus eager execution was disabled by default. Once I enabled eager execution I didn't need to do anything special on the other thread, it just worked.
I created a separate copy of the model on the main thread, and copied the weights from the training thread in a lock right before execution.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ddpg
You can use ddpg like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page