A3C-Continuous | Tensorflow implementation | Reinforcement Learning library
kandi X-RAY | A3C-Continuous Summary
kandi X-RAY | A3C-Continuous Summary
Tensorflow implementation of the asynchronous advantage actor-critic (A3C) reinforcement learning algorithm (paper) for continuous action space. Code is mostly based on Morvan Zhou (github).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Run the action loop
- Choose a single action
- Pulls all parameters from the global variables
- Update global variables
A3C-Continuous Key Features
A3C-Continuous Examples and Code Snippets
Community Discussions
Trending Discussions on A3C-Continuous
QUESTION
I want to implement reinforcement learning for a game which uses the mouse to move. This game only cares about x-axis of the mouse.
My first try is to make it discrete. The game will have 3 actions. Two actions are used to move the mouse 30 pixels to left and right and one action for standing still. It worked but now I want to make it continuous.
What I have done is to make the neural network output mean and std. Exactly like this code https://github.com/stefanbo92/A3C-Continuous/blob/master/a3c.py. I even used this code on a second try. The width of the game is 480 so A_BOUND are [-240,240]. To make the problem always have a positive action, I added the predicted action to 240 then set the mouse position to the new one.
For example: If the action is 240 + -240, then the mouse x pos will be 0. The problem is that my neural network output only extremes from 240 to -240 consistently seconds after the start.
...ANSWER
Answered 2018-Sep-14 at 17:05The reason for your problem is because the output of your neural network is being squashed by an activation function. This is a problem because there are very few values that results in a output that is not the max or min value.
The above is the shape of the hyperbolic tangent activation function. As you can see, the value is only non max/min if the input values are between -3 to 3, any values outside of that results in either the max or min values.
To overcome this, you must initialize your neural network with very small weights. You can initialize the weights using random uniform values between -0.003 to 0.003. Those are the values I use. This way, initially, your neural network will output close to 0 values, and then the weights will be updated and the learning will be more stable.
To further correct for this error, you must put a small penalty for performing a large changes in state.
For example, penalty = (state * 0.01) ^ 2, where state = [-240, 240].
This way, your neural network will realize that theres a higher loss associated with large changes, so it will use it sparingly, and only when necessary.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install A3C-Continuous
You can use A3C-Continuous like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page