deep-q-learning | PyTorch Implementation of Deep Q | Reinforcement Learning library

by diegoalejogm Python Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | deep-q-learning Summary

deep-q-learning is a Python library typically used in Artificial Intelligence, Reinforcement Learning, Deep Learning, Pytorch applications. deep-q-learning has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can download it from GitHub.

PyTorch Implementation of Deep Q-Learning with Experience Replay in Atari Game Environments, as made public by Google DeepMind

Support

Quality

Security

License

Reuse

Support

deep-q-learning has a low active ecosystem.

It has 79 star(s) with 18 fork(s). There are 4 watchers for this library.

It had no major release in the last 6 months.

deep-q-learning has no issues reported. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of deep-q-learning is current.

Quality

deep-q-learning has no bugs reported.

Security

deep-q-learning has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

deep-q-learning is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

deep-q-learning releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Top functions reviewed by kandi - BETA

kandi has reviewed deep-q-learning and discovered the below as its top functions. This is intended to give you an instant insight into deep-q-learning implemented functionality, and help decide if they suit your requirements.

Generate an action using an e - greedy action
Convert NumPy array to Variable
Sample random variates
Load memory into memory
Load numpy arrays
Make directory
Calculate the average q loss
Log data to Tensorboard
Save Replay Memory Instance
Saves numpy arrays
Generate phi map of images
Convert 3D images to 4D numpy array
Save the model to a directory
Get list items
Return True if there are enough samples in the stream
Returns a copy of the network
Add reward to episode
Write the model to the log file
Reset the current episode
Obtain Q values for a given model
Calculates the Q value for a given model
Add an experience
Greedy action function
Gradient descent function
Define flags
Set epsilon

Get all kandi verified functions for this library.

deep-q-learning Key Features

No Key Features are available at this moment for deep-q-learning.

deep-q-learning Examples and Code Snippets

No Code Snippets are available at this moment for deep-q-learning.

Community Discussions

Trending Discussions on deep-q-learning

How to make this RL code get GPU support?

Why is an array multiplied by [0] equal to the first element?

QUESTION

How to make this RL code get GPU support?

Asked 2019-Aug-22 at 12:01

https://github.com/keon/deep-q-learning/blob/master/dqn.py#L52

...

ANSWER

Answered 2019-Aug-22 at 09:15

In Reinforcement Learning (RL) there is often a lot of CPU computation required for each sample step (of course dependent on environment, some environments can use GPU too). The RL-model has a hard time understanding the rewards and what action caused that specific reward, since a good reward could be dependent on a way earlier action. Therefore we want a simple model-architectures (shallow and fewer weights) while doing RL, else the training time will be way to slow. Hence your systems bottle neck is likely gathering samples rather than training the data. Also Note that not all Tensorflow-architectures scale equally well with GPU. Deep models with high numbers of weights like most Images cases scales super well (like CNN and MLP network with MNIST), while time-dependent RNN has less speedup potential (see this stackexchange question). So set your expectation accordingly when using GPU.

Through my RL experience, I have figured some possible speedups I could share, and would love to see more suggestions!

Single sample step, can be speed up by creating multiple environment run in parallel, equal to the number of CPU cores (there are packages for parallel processing in python you can use fore this). This can potential speed up sampling data proportional to the number of CPU cores.
Between sampling you have to do model predictions for next action. Instead of calling model.predict at each step, you can call a single model.predict for all your parallel states (using a batch_size equal to the number of parallel environments). This will speed up prediction time, as there is more optimization options.
The change from updating model weights to prediction is surprisingly slow. Hopefully this will be speed up in the future? But while the change is as slow as today, you can speed up training by holding the model constant and do lots of sample and prediction (example a whole episode, or multiple steps within an episode), then train the model on all the newly gathered data afterwards. In my case this resulted in periodically high GPU utilization.
Since sampling is most likely the bottle neck, you can make a historical repo of state, action, rewards. Than at training you can sample randomly data from this repo and train it together with the newly gathered data. This is known as "Experience Replay" in RL.
Maybe the most fun, and highest potential for improvements is by using more advance RL-learning architectures. Example changing the loss function (check out PPO for example), using and tuning the "generalized advantage estimation" calculated by the rewards. Or changing the model by for example including time dependencies with RNN, VAC or combining them all like here.

Hopefully this help you speed up the training time, and maybe get more utilization of your GPU.

Source https://stackoverflow.com/questions/57603707

QUESTION

Why is an array multiplied by [0] equal to the first element?

Asked 2019-Jul-18 at 14:01

I have the following code:

...

ANSWER

Answered 2019-Jul-18 at 08:52

It is because [12, 2] is a list, and next notation: [0] or [1] is indexing.

You can test it if you try to print: print([12, 2][2]) you should get index out of range error.

EDIT: to answer your second question:

It is hard to say. target_f = self.model.predict(state) - it is some kind of structure and I can't find information about this structure in the link you put above.

But we can consider some similar structure. Let's say you have:

Source https://stackoverflow.com/questions/57089101

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install deep-q-learning

You can download it from GitHub.
You can use deep-q-learning like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: