deep-q-learning | PyTorch Implementation of Deep Q | Reinforcement Learning library
kandi X-RAY | deep-q-learning Summary
kandi X-RAY | deep-q-learning Summary
PyTorch Implementation of Deep Q-Learning with Experience Replay in Atari Game Environments, as made public by Google DeepMind
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Generate an action using an e - greedy action
- Convert NumPy array to Variable
- Sample random variates
- Load memory into memory
- Load numpy arrays
- Make directory
- Calculate the average q loss
- Log data to Tensorboard
- Save Replay Memory Instance
- Saves numpy arrays
- Generate phi map of images
- Convert 3D images to 4D numpy array
- Save the model to a directory
- Get list items
- Return True if there are enough samples in the stream
- Returns a copy of the network
- Add reward to episode
- Write the model to the log file
- Reset the current episode
- Obtain Q values for a given model
- Calculates the Q value for a given model
- Add an experience
- Greedy action function
- Gradient descent function
- Define flags
- Set epsilon
deep-q-learning Key Features
deep-q-learning Examples and Code Snippets
Community Discussions
Trending Discussions on deep-q-learning
QUESTION
ANSWER
Answered 2019-Aug-22 at 09:15In Reinforcement Learning (RL) there is often a lot of CPU computation required for each sample step (of course dependent on environment, some environments can use GPU too). The RL-model has a hard time understanding the rewards and what action caused that specific reward, since a good reward could be dependent on a way earlier action. Therefore we want a simple model-architectures (shallow and fewer weights) while doing RL, else the training time will be way to slow. Hence your systems bottle neck is likely gathering samples rather than training the data. Also Note that not all Tensorflow-architectures scale equally well with GPU. Deep models with high numbers of weights like most Images cases scales super well (like CNN and MLP network with MNIST), while time-dependent RNN has less speedup potential (see this stackexchange question). So set your expectation accordingly when using GPU.
Through my RL experience, I have figured some possible speedups I could share, and would love to see more suggestions!
- Single sample step, can be speed up by creating multiple environment run in parallel, equal to the number of CPU cores (there are packages for parallel processing in python you can use fore this). This can potential speed up sampling data proportional to the number of CPU cores.
Between sampling you have to do model predictions for next action. Instead of calling model.predict at each step, you can call a single model.predict for all your parallel states (using a batch_size equal to the number of parallel environments). This will speed up prediction time, as there is more optimization options.
The change from updating model weights to prediction is surprisingly slow. Hopefully this will be speed up in the future? But while the change is as slow as today, you can speed up training by holding the model constant and do lots of sample and prediction (example a whole episode, or multiple steps within an episode), then train the model on all the newly gathered data afterwards. In my case this resulted in periodically high GPU utilization.
Since sampling is most likely the bottle neck, you can make a historical repo of state, action, rewards. Than at training you can sample randomly data from this repo and train it together with the newly gathered data. This is known as "Experience Replay" in RL.
Maybe the most fun, and highest potential for improvements is by using more advance RL-learning architectures. Example changing the loss function (check out PPO for example), using and tuning the "generalized advantage estimation" calculated by the rewards. Or changing the model by for example including time dependencies with RNN, VAC or combining them all like here.
Hopefully this help you speed up the training time, and maybe get more utilization of your GPU.
QUESTION
I have the following code:
...ANSWER
Answered 2019-Jul-18 at 08:52It is because [12, 2]
is a list, and next notation: [0]
or [1]
is indexing.
You can test it if you try to print: print([12, 2][2])
you should get index out of range error.
EDIT: to answer your second question:
It is hard to say. target_f = self.model.predict(state)
- it is some kind of structure and I can't find information about this structure in the link you put above.
But we can consider some similar structure. Let's say you have:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install deep-q-learning
You can use deep-q-learning like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page