TF_RL | Eagerly Experimentable!!! | Machine Learning library
kandi X-RAY | TF_RL Summary
kandi X-RAY | TF_RL Summary
This is the repo for implementing and experimenting the variety of RL algorithms using Tensorflow Eager Execution. And, since our Lord Google gracefully allows us to use their precious GPU resources without almost restriction, I have decided to enable most of codes run on Google Colab. So, if you don't have GPUs, please feel free to try it out on Google Colab. Note: As it is known that Eager mode takes time than Graph Execution in general so that in this repo, I use Eager for debugging and Graph mode for training!!! The beauty of eager mode come here!! we can flexibly switch eager mode and graph mode with minimal modification(@tf.contrib.eager.defun), pls check the link.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Train the model
- Return a random storage index
- Store an episode
- Append data pairs
- Performs a pretraining with the given policy
- Sample from the model
- Sample the probability distribution of a given size
- Add an item to the heap
- Move agent
- Train DDPG on policy
- Train the DRQN
- Perform a DQFD
- Train DQN per step
- A train DDPG
- Train a TFNQN_DQN
- Train a DQN
- Train a simulated agent
- Pre - train the agent without priorization
- Train the DQN algorithm
- Train double DQN
- Train a TRPO model
- Run the model
- Explains an environment
- Advances the agent
- Start training
- Plot plot
TF_RL Key Features
TF_RL Examples and Code Snippets
Community Discussions
Trending Discussions on TF_RL
QUESTION
I am trying to write my own DQN algorithm in Python, using Tensorflow following the paper(Mnih et al., 2015). In train_DQN
function, I have defined the training procedure, and DQN_CartPole
is for defining the function approximation(simple 3-layered Neural Network). For loss function, Huber loss or MSE is implemented followed by the gradient clipping(between -1 and 1). Then, I have implemented soft-update method instead of hard-update of the target network by copying the weights in the main network.
I am trying it on the CartPole environment(OpenAI gym), but the rewards does not improve as it does in other people's algorithms, such as keras-rl. Any help will be appreciated.
If possible, could you have a look at the source code?
- DQN model: https://github.com/Rowing0914/TF_RL/blob/master/agents/DQN_model.py
- Training Script: https://github.com/Rowing0914/TF_RL/blob/master/agents/DQN_train.py
- Reddit post: https://www.reddit.com/r/reinforcementlearning/comments/ba7o55/question_dqn_algorithm_does_not_work_well_on/?utm_source=share&utm_medium=web2x
ANSWER
Answered 2019-Apr-06 at 19:33Briefly looking over, it seems that the dones
variable is a binary vector where 1
denotes done, and 0
denotes not-done.
You then use dones
here:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install TF_RL
Install from Github source
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page