tf-agent | tensorflow reinforcement learning agents for OpenAI gym | Reinforcement Learning library
kandi X-RAY | tf-agent Summary
kandi X-RAY | tf-agent Summary
tensorflow reinforcement learning agents for OpenAI gym environments
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Run a rollout
- Resample a frame
- Discrete discount function
- Estimate policy logits
tf-agent Key Features
tf-agent Examples and Code Snippets
Community Discussions
Trending Discussions on tf-agent
QUESTION
I am trying to create a batched environment version of an SAC agent example from the Tensorflow Agents library, the original code can be found here. I am also using a custom environment.
I am pursuing a batched environment setup in order to better leverage GPU resources in order to speed up training. My understanding is that by passing batches of trajectories to the GPU, there will be less overhead incurred when passing data from the host (CPU) to the device (GPU).
My custom environment is called SacEnv
, and I attempt to create a batched environment like so:
ANSWER
Answered 2022-Feb-19 at 18:11It turns out I neglected to pass batch_size
when initializing the AverageReturnMetric
and AverageEpisodeLengthMetric
instances.
QUESTION
I have a bunch of Java code that constitutes an environment and an agent. I want to use one of the Python reinforcement learning libraries (stable-baselines, tf-agents, rllib, etc.) to train a policy for the Java agent/environment. And then deploy the policy on the Java side for production. Is there standard practice for incorporating other languages into Python RL libraries? I was thinking of one of the following solutions:
- Wrap Java env/agent code into REST API, and implement custom environment in Python that calls that API to step through the environment.
- Use Py4j to invoke Java from Python and implement custom environment.
Which one would be better? Are there any other ways?
...Edit: I ended up going the former - deploying a web server that encapsulates the environments. works quite well for me. Leaving the question open in case there is a better practice to handle this kind of situations!
ANSWER
Answered 2021-Sep-20 at 09:13The first approach is fine. RLLib implemented it the same way for the PolicyServerInput. Which is used for external Envs. https://github.com/ray-project/ray/blob/82465f9342cf05d86880e7542ffa37676c2b7c4f/rllib/env/policy_server_input.py
So take a look into their implementation. It uses Python data serialization, so I guess an own impl would be best to connect to Java.
QUESTION
I have a policy that I read from disk using the function SavedModelPyTFEagerPolicy. For troubleshooting the environment definitions, I would like to examine the predicted value of different states.
I have had success using these instructions to extract the actions from the policy for test cases. Is there a function that will allow me to extract the predicted values associated with those actions?
...ANSWER
Answered 2021-Aug-23 at 10:41Looking at the Tensorflow DQN Agent documentation you hand a q-network to the agent at creation time. This get saved as an instance variable with the name _q_network
and can be accessed with agent._q_network
. To quote the documentation:
The network will be called with call(observation, step_type) and should emit logits over the action space.
Those logits are your respective state action values.
QUESTION
I am current working on using Python.NET to build C# environments for interaction TensorFlow Agents and am receiving a TensorFlow error attempting to load Cuda DLLs.
When I run pure python examples Tensor flow loads the CUDA DLLs without issue:
...ANSWER
Answered 2021-Apr-20 at 00:23I solved this issue. It was due to bad Python.Net wiki documentation showing how to use Python.Net in a virtual environment.
The fix, for others facing this or very similar issues, is to not use the code in the Wiki:
QUESTION
I apologize in advance for the question in the title not being very clear. I'm trying to train a reinforcement learning policy using tf-agents in which there exists some unobservable stochastic variable that affects the state.
For example, consider the standard CartPole problem, but we add wind where the velocity changes over time. I don't want to train an agent that relies on having observed the wind velocity at each step; I instead want the wind to affect the position and angular velocity of the pole, and the agent to learn to adapt just as it would in the wind-free environment. In this example however, we would need the wind velocity at the current time to be correlated with the wind velocity at the previous time e.g. we wouldn't want the wind velocity to change from 10m/s at time t to -10m/s at time t+1.
The problem I'm trying to solve is how to track the state of the exogenous variable without making it part of the observation spec that gets fed into the neural network when training the agent. Any guidance would be appreciated.
...ANSWER
Answered 2021-Jan-13 at 08:17Yes, that is no problem at all. Your environment object (a subclass of PyEnvironment
or TFEnvironment
) can do whatever you want within it. The observation_spec
requirement is only related to the TimeStep that you output in the step
and reset
methods (more precisely in your implementation of the _step
and _reset
abstract methods).
Your environment however is completely free to have any additional attributes that you might want (like parameters to control wind generation) and any number of additional methods you like (like methods to generate the wind at this timestep according to self._wind_hyper_params
). A quick schematic of your code would look like is below:
QUESTION
I'm using TF-Agents library for reinforcement learning, and I would like to take into account that, for a given state, some actions are invalid.
How can this be implemented?
Should I define a "observation_and_action_constraint_splitter" function when creating the DqnAgent?
If yes: do you know any tutorial on this?
...ANSWER
Answered 2020-Dec-13 at 12:39Yes you need to define the function, pass it to the agent and also appropriately change the environment output so that the function can work with it. I am not aware on any tutorials on this, however you can look at this repo I have been working on.
Note that it is very messy and a lot of the files in there actually are not being used and the docstrings are terrible and often wrong (I forked this and didn't bother to sort everything out). However it is definetly working correctly. The parts that are relevant to your question are:
rl_env.py
in theHanabiEnv.__init__
where the_observation_spec
is defined as a dictionary ofArraySpecs
(here). You can ignoregame_obs
,hand_obs
andknowledge_obs
which are used to run the environment verbosely, they are not fed to the agent.rl_env.py
in theHanabiEnv._reset
at line 110 gives an idea of how the timestep observations are constructed and returned from the environment.legal_moves
are passed through anp.logical_not
since my specific environment marks legal_moves with 0 and illegal ones with -inf; whilst TF-Agents expects a 1/True for a legal move. My vector when cast to bool would therefore result in the exact opposite of what it should be for TF-agents.These observations will then be fed to the
observation_and_action_constraint_splitter
inutility.py
(here) where a tuple containing the observations and the action constraints is returned. Note thatgame_obs
,hand_obs
andknowledge_obs
are implicitly thrown away (and not fed to the agent as previosuly mentioned.Finally this
observation_and_action_constraint_splitter
is fed to the agent inutility.py
in thecreate_agent
function at line 198 for example.
QUESTION
Suppose that two integer arrays min
and max
are given and they have equal shape
. How to generate all Numpy arrays such that min[indices] <= ar[indices] <= max[indices]
for all indices in np.ndindex(shape)
? I have looked at the Numpy array creation routines but none of them seem to do what I want. I considered also starting with the min
array and looping over its indices, adding 1
until the corresponding entry in max
was reached, but I want to know if Numpy provides methods to do this more cleanly. As an example, if
ANSWER
Answered 2020-Oct-02 at 18:48This will work,
Also since range
and itertools.product
both returns a generator it's memory efficient (O(1) space).
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install tf-agent
You can use tf-agent like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page