OmniIsaacGymEnvs-UR10Reacher | UR10 Reacher Reinforcement Learning Sim2Real Environment | Reinforcement Learning library

by j3soon Python Version: v1.0.0 License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | OmniIsaacGymEnvs-UR10Reacher Summary

OmniIsaacGymEnvs-UR10Reacher is a Python library typically used in Artificial Intelligence, Reinforcement Learning applications. OmniIsaacGymEnvs-UR10Reacher has no bugs, it has no vulnerabilities, it has build file available and it has low support. However OmniIsaacGymEnvs-UR10Reacher has a Non-SPDX License. You can download it from GitHub.

This repository adds a UR10Reacher environment based on OmniIsaacGymEnvs (commit d0eaf2e), and includes Sim2Real code to control a real-world UR10 with the policy learned by reinforcement learning in Omniverse Isaac Gym/Sim. We target Isaac Sim 2022.1.1 and test the RL code on Windows 10 and Ubuntu 18.04. The Sim2Real code is tested on Linux and a real UR5 CB3 (since we don't have access to a real UR10). This repo is compatible with OmniIsaacGymEnvs-DofbotReacher.

Support

Quality

Security

License

Reuse

Support

OmniIsaacGymEnvs-UR10Reacher has a low active ecosystem.

It has 12 star(s) with 1 fork(s). There are 2 watchers for this library.

It had no major release in the last 12 months.

OmniIsaacGymEnvs-UR10Reacher has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of OmniIsaacGymEnvs-UR10Reacher is v1.0.0

Quality

OmniIsaacGymEnvs-UR10Reacher has no bugs reported.

Security

OmniIsaacGymEnvs-UR10Reacher has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

OmniIsaacGymEnvs-UR10Reacher has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

OmniIsaacGymEnvs-UR10Reacher releases are available to install and integrate.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of OmniIsaacGymEnvs-UR10Reacher

Get all kandi verified functions for this library.

OmniIsaacGymEnvs-UR10Reacher Key Features

No Key Features are available at this moment for OmniIsaacGymEnvs-UR10Reacher.

OmniIsaacGymEnvs-UR10Reacher Examples and Code Snippets

No Code Snippets are available at this moment for OmniIsaacGymEnvs-UR10Reacher.

Community Discussions

Trending Discussions on Reinforcement Learning

Keras: AttributeError: 'Adam' object has no attribute '_name'

What are vectorized environments in reinforcement learning?

How does a gradient backpropagates through random samples?

Relationship of Horizon and Discount factor in Reinforcement Learning

OpenAI-Gym and Keras-RL: DQN expects a model that has one dimension for each action

gym package not identifying ten-armed-bandits-v0 env

ValueError: Input 0 of layer "max_pooling2d" is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: (None, 3, 51, 39, 32)

Stablebaselines3 logging reward with custom gym

What is the purpose of [np.arange(0, self.batch_size), action] after the neural network?

DQN predicts same action value for every state (cart pole)

QUESTION

Keras: AttributeError: 'Adam' object has no attribute '_name'

Asked 2022-Apr-16 at 15:05

I want to compile my DQN Agent but I get error: AttributeError: 'Adam' object has no attribute '_name',

...

ANSWER

Answered 2022-Apr-16 at 15:05

Your error came from importing Adam with from keras.optimizer_v1 import Adam, You can solve your problem with tf.keras.optimizers.Adam from TensorFlow >= v2 like below:

(The lr argument is deprecated, it's better to use learning_rate instead.)

Source https://stackoverflow.com/questions/71894769

QUESTION

What are vectorized environments in reinforcement learning?

Asked 2022-Mar-25 at 10:37

I'm having a hard time wrapping my head around what and when vectorized environments should be used. If you can provide an example of a use case, that would be great.

Documentation of vectorized environments in SB3: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html

...

ANSWER

Answered 2022-Mar-25 at 10:37

Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of executing and training an agent on 1 environment per step, it allows to train the agent on multiple environments per step.

Usually you also want these environment to have different seeds, in order to gain more diverse experience. This is very useful to speed up training.

I think they are called "vectorized" since each training step the agent observes multiple states (inserted in a vector), outputs multiple actions (one for each environment), which are inserted in a vector, and receives multiple rewards. Hence the "vectorized" term

Source https://stackoverflow.com/questions/71549439

QUESTION

How does a gradient backpropagates through random samples?

Asked 2022-Mar-25 at 03:06

I'm learning about policy gradients and I'm having hard time understanding how does the gradient passes through a random operation. From here: It is not possible to directly backpropagate through random samples. However, there are two main methods for creating surrogate functions that can be backpropagated through.

They have an example of the score function:

...

ANSWER

Answered 2021-Nov-30 at 05:48

It is indeed true that sampling is not a differentiable operation per se. However, there exist two (broad) ways to mitigate this - [1] The REINFORCE way and [2] The reparameterization way. Since your example is related to [1], I will stick my answer to REINFORCE.

What REINFORCE does is it entirely gets rid of sampling operation in the computation graph. However, the sampling operation remains outside the graph. So, your statement

.. how does the gradient passes through a random operation ..

isn't correct. It does not pass through any random operation. Let's see your example

Source https://stackoverflow.com/questions/70163823

QUESTION

Relationship of Horizon and Discount factor in Reinforcement Learning

Asked 2022-Mar-13 at 17:50

What is the connection between discount factor gamma and horizon in RL.

What I have learned so far is that the horizon is the agent`s time to live. Intuitively, agents with finite horizon will choose actions differently than if it has to live forever. In the latter case, the agent will try to maximize all the expected rewards it may get far in the future.

But the idea of the discount factor is also the same. Are the values of gamma near zero makes the horizon finite?

...

ANSWER

Answered 2022-Mar-13 at 17:50

Horizon refers to how many steps into the future the agent cares about the reward it can receive, which is a little different from the agent's time to live. In general, you could potentially define any arbitrary horizon you want as the objective. You could define a 10 step horizon, in which the agent makes a decision that will enable it to maximize the reward it will receive in the next 10 time steps. Or we could choose a 100, or 1000, or n step horizon!

Usually, the n-step horizon is defined using n = 1 / (1-gamma). Therefore, 10 step horizon will be achieved using gamma = 0.9, while 100 step horizon can be achieved with gamma = 0.99

Therefore, any value of gamma less than 1 imply that the horizon is finite.

Source https://stackoverflow.com/questions/71459191

QUESTION

OpenAI-Gym and Keras-RL: DQN expects a model that has one dimension for each action

Asked 2022-Mar-02 at 10:55

I am trying to set a Deep-Q-Learning agent with a custom environment in OpenAI Gym. I have 4 continuous state variables with individual limits and 3 integer action variables with individual limits.

Here is the code:

...

ANSWER

Answered 2021-Dec-23 at 11:19

As we talked about in the comments, it seems that the Keras-rl library is no longer supported (the last update in the repository was in 2019), so it's possible that everything is inside Keras now. I take a look at Keras documentation and there are no high-level functions to build a reinforcement learning model, but is possible to use lower-level functions to this.

Here is an example of how to use Deep Q-Learning with Keras: link

Another solution may be to downgrade to Tensorflow 1.0 as it seems the compatibility problem occurs due to some changes in version 2.0. I didn't test, but maybe the Keras-rl + Tensorflow 1.0 may work.

There is also a branch of Keras-rl to support Tensorflow 2.0, the repository is archived, but there is a chance that it will work for you

Source https://stackoverflow.com/questions/70261352

QUESTION

gym package not identifying ten-armed-bandits-v0 env

Asked 2022-Feb-08 at 08:01

Environment:

Python: 3.9
OS: Windows 10

When I try to create the ten armed bandits environment using the following code the error is thrown not sure of the reason.

...

ANSWER

Answered 2022-Feb-08 at 08:01

It could be a problem with your Python version: k-armed-bandits library was made 4 years ago, when Python 3.9 didn't exist. Besides this, the configuration files in the repo indicates that the Python version is 2.7 (not 3.9).

If you create an environment with Python 2.7 and follow the setup instructions it works correctly on Windows:

Source https://stackoverflow.com/questions/70858340

QUESTION

ValueError: Input 0 of layer "max_pooling2d" is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: (None, 3, 51, 39, 32)

Asked 2022-Feb-01 at 07:31

I have two different problems occurs at the same time.

I am having dimensionality problems with MaxPooling2d and having same dimensionality problem with DQNAgent.

The thing is, I can fix them seperately but cannot at the same time.

First Problem

I am trying to build a CNN network with several layers. After I build my model, when I try to run it, it gives me an error.

...

ANSWER

Answered 2022-Feb-01 at 07:31

Issue is with input_shape. input_shape=input_shape[1:]

Working sample code

Source https://stackoverflow.com/questions/70808035

QUESTION

Stablebaselines3 logging reward with custom gym

Asked 2021-Dec-25 at 01:10

I have this custom callback to log the reward in my custom vectorized environment, but the reward appears in console as always [0] and is not logged in tensorboard at all

...

ANSWER

Answered 2021-Dec-25 at 01:10

You need to add [0] as indexing,

so where you wrote self.logger.record('reward', self.training_env.get_attr('total_reward')) you just need to index with self.logger.record('reward', self.training_env.get_attr ('total_reward')[0])

Source https://stackoverflow.com/questions/70468394

QUESTION

What is the purpose of [np.arange(0, self.batch_size), action] after the neural network?

Asked 2021-Dec-23 at 11:07

I followed a PyTorch tutorial to learn reinforcement learning(TRAIN A MARIO-PLAYING RL AGENT) but I am confused about the following code:

...

ANSWER

Answered 2021-Dec-23 at 11:07

Essentially, what happens here is that the output of the net is being sliced to get the desired part of the Q table.

The (somewhat confusing) index of [np.arange(0, self.batch_size), action] indexes each axis. So, for axis with index 1, we pick the item indicated by action. For index 0, we pick all items between 0 and self.batch_size.

If self.batch_size is the same as the length of dimension 0 of this array, then this slice can be simplified to [:, action] which is probably more familiar to most users.

Source https://stackoverflow.com/questions/70458347

QUESTION

DQN predicts same action value for every state (cart pole)

Asked 2021-Dec-22 at 15:55

I'm trying to implement a DQN. As a warm up I want to solve CartPole-v0 with a MLP consisting of two hidden layers along with input and output layers. The input is a 4 element array [cart position, cart velocity, pole angle, pole angular velocity] and output is an action value for each action (left or right). I am not exactly implementing a DQN from the "Playing Atari with DRL" paper (no frame stacking for inputs etc). I also made a few non standard choices like putting done and the target network prediction of action value in the experience replay, but those choices shouldn't affect learning.

In any case I'm having a lot of trouble getting the thing to work. No matter how long I train the agent it keeps predicting a higher value for one action over another, for example Q(s, Right)> Q(s, Left) for all states s. Below is my learning code, my network definition, and some results I get from training

...

ANSWER

Answered 2021-Dec-19 at 16:09

There was nothing wrong with the network definition. It turns out the learning rate was too high and reducing it 0.00025 (as in the original Nature paper introducing the DQN) led to an agent which can solve CartPole-v0.

That said, the learning algorithm was incorrect. In particular I was using the wrong target action-value predictions. Note the algorithm laid out above does not use the most recent version of the target network to make predictions. This leads to poor results as training progresses because the agent is learning based on stale target data. The way to fix this is to just put (s, a, r, s', done) into the replay memory and then make target predictions using the most up to date version of the target network when sampling a mini batch. See the code below for an updated learning loop.

Source https://stackoverflow.com/questions/70382999

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install OmniIsaacGymEnvs-UR10Reacher

We will use Anaconda to manage our virtual environment:. Please note that you should execute the commands in Step 7 for every new shell. For Windows users, replace ~ to %USERPROFILE% for all the following commands.
Install Omniverse Isaac Sim 2022.1.1 (Must setup Cache and Nucleus)
Your computer & GPU should be able to run the Cartpole example in OmniIsaacGymEnvs
(Optional) Set up a UR3/UR5/UR10 in the real world
Clone this repository: Linux cd ~ git clone https://github.com/j3soon/OmniIsaacGymEnvs-UR10Reacher.git Windows cd %USERPROFILE% git clone https://github.com/j3soon/OmniIsaacGymEnvs-UR10Reacher.git
Generate instanceable UR10 assets for training: Launch the Script Editor in Isaac Sim. Copy the content in omniisaacgymenvs/utils/usd_utils/create_instanceable_ur10.py and execute it inside the Script Editor window. Wait until you see the text Done!.
(Optional) Install ROS Melodic for Ubuntu and Set up a catkin workspace for UR10. Please change all catkin_ws in the commands to ur_ws, and make sure you can control the robot with rqt-joint-trajectory-controller. ROS support is not tested on Windows.
Download and Install Anaconda. # For 64-bit Linux (x86_64/x64/amd64/intel64) wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh bash Anaconda3-2022.10-Linux-x86_64.sh For Windows users, make sure to use Anaconda Prompt instead of Command Prompt or Powershell for the following commands.
Patch Isaac Sim 2022.1.1 Linux export ISAAC_SIM="$HOME/.local/share/ov/pkg/isaac_sim-2022.1.1" cp $ISAAC_SIM/setup_python_env.sh $ISAAC_SIM/setup_python_env.sh.bak cp ~/OmniIsaacGymEnvs-UR10Reacher/isaac_sim-2022.1.1-patch/setup_python_env.sh $ISAAC_SIM/setup_python_env.sh Windows set ISAAC_SIM="%LOCALAPPDATA%\ov\pkg\isaac_sim-2022.1.1" copy %USERPROFILE%\OmniIsaacGymEnvs-UR10Reacher\isaac_sim-2022.1.1-patch\windows\setup_conda_env.bat %ISAAC_SIM%\setup_conda_env.bat
Set up conda environment for Isaac Sim Linux # conda remove --name isaac-sim --all export ISAAC_SIM="$HOME/.local/share/ov/pkg/isaac_sim-2022.1.1" cd $ISAAC_SIM conda env create -f environment.yml conda activate isaac-sim cd ~/OmniIsaacGymEnvs-UR10Reacher pip install -e . # Below is optional pip install pyyaml rospkg Windows # conda remove --name isaac-sim --all set ISAAC_SIM="%LOCALAPPDATA%\ov\pkg\isaac_sim-2022.1.1" cd %ISAAC_SIM% conda env create -f environment.yml conda activate isaac-sim :: Fix incorrect importlib-metadata version (isaac-sim 2022.1.1) pip install importlib-metadata==4.11.4 cd %USERPROFILE%\OmniIsaacGymEnvs-UR10Reacher pip install -e . :: Fix incorrect torch version (isaac-sim 2022.1.1) conda install -y pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
Activate conda & ROS environment Linux export ISAAC_SIM="$HOME/.local/share/ov/pkg/isaac_sim-2022.1.1" cd $ISAAC_SIM conda activate isaac-sim source setup_conda_env.sh # Below are optional cd ~/ur_ws source devel/setup.bash # or setup.zsh if you're using zsh Windows set ISAAC_SIM="%LOCALAPPDATA%\ov\pkg\isaac_sim-2022.1.1" cd %ISAAC_SIM% conda activate isaac-sim call setup_conda_env.bat
Follow the Isaac Sim documentation to install the latest Isaac Sim release. Once installed, this repository can be used as a python module, omniisaacgymenvs, with the python executable provided in Isaac Sim.

Support

You can run (WandB)[https://wandb.ai/] with OmniIsaacGymEnvs by setting wandb_activate=True flag from the command line. You can set the group, name, entity, and project for the run by setting the wandb_group, wandb_name, wandb_entity and wandb_project arguments. Make sure you have WandB installed in the Isaac Sim Python executable with PYTHON_PATH -m pip install wandb before activating.

Find more information at: