Open X-Embodiment shares ultra-large-scale open source real robot data sets

Recently, DeepMind, a cutting-edge artificial intelligence company owned by Google, gathered data from 22 different robot types, created the Open X-Embodiment data set and made it open source. This data set provides a major leap forward in the way the RT-2 robot they developed is built and programmed.

Some analysts claim that the performance of RT-2-X trained on the above-mentioned data set has increased by 2 times in real-world robot skills, and RT-2-X has mastered many new skills by learning new data. NVIDIA senior artificial intelligence scientist Jim Fan even publicly stated that this data set may be the ImageNet moment for robots.

picture

Google has opened the X-Embodiment repository (robotics-transformer-X.github.io) to store the Open X-Embodiment dataset. This is an open source repository that includes large-scale data for X-embodied robot learning research and checkpoints of pre-trained models.

In order to facilitate research on embodied robotics technology and improve the efficiency of data preparation, OpenDataLab (opendatalab.com) has organized and put on the shelves the Open data published by DeepMind. X-Embodiment data set, everyone is welcome to download and explore.

In addition, the star search project is in full swing. Upload your original data set to receive gifts, click to participate→Look for the brightest OpenDataLab data star, We want you! a>

Dataset overview

Open X-Embodiment sub-dataset information list:

https://docs.google.com/spreadsheets/d/1rPBD77tk60AEIGZrGSODwyyzs5FgCU9Uz3h-3_t2A9g/edit?pli=1#gid=0

Key words:

● 21 scientific research institutions

● 22 robots

● Fusion of 60 existing data sets

● 527 skills

● 160,266 tasks

● 1,402,930 pieces of data (about 3600G in total)

data processing:

All source data sets are uniformly converted into RLDS format.
For the different formats and contents of the source data, the following processing has been done:

1. For data sets with multiple perspectives, only select one of the "canonical" perspective images (guessed to be the one closer to the top-down first-person perspective/Proprioception).

2. Resize the image to 320×256 (width×height).

3. Convert all original actions (such as joint position) into EE actions, but the action amount may be a relative value or an absolute value. After the model outputs action tokens∈ [0, 255]\in [0, 255], it performs different de-normalization according to different robots and then issues specific control instructions.

Dataset characteristics:

1. The robots involved in the 60 data sets include one-arm, two-arm and four-legged robots, with Franka accounting for the majority.

2. In terms of data volume, xArm accounts for the largest share, mainly because the language table has a large data set of 440,000 items; Kuka iiwa mainly comes from the contribution of QT-Opt; in addition, it was originally used in Everday Robot (now called in the paper RT1 data collected on Google Robot).

3. Skills are still mainly concentrated on pick-place, and the overall distribution still shows a long-tail distribution, with many more difficult skills such as wiping and assembling at the tail.

4. The main scenes and manipulated objects are concentrated in homes, kitchen scenes, furniture, food, tableware and other items. [1]

picture

Introduction to subdatasets

No.1 RoboVQA

● Published by: University of Bremen

Time: 2019

Simplification

RobotVQA takes a scene RGB(D) image as input and outputs the corresponding scene graph. RobotVQA stands for Robot Visual Question Answering. The authors demonstrate the transferability of RobotVQA knowledge from the virtual world to the real world and its applicability to robot control programs.

● Lower land site

https://opendatalab.com/OpenDataLab/RoboVQA

● 论文地址

https://arxiv.org/pdf/1709.10489.pdf

No.2 RoboNet

● Published by: Carnegie Mellon University·University of Pennsylvania·Stanford University

Time: 2020

Simplification

An open database for sharing robotics experience, which provides an initial pool of 15 million video frames from 7 different robotics platforms and investigates how it can be used to learn a general model for vision-based robot operation.

● Download address:
https://opendatalab.com/OpenDataLab/RoboNet

● 论文地址

https://arxiv.org/pdf/1910.11215v2.pdf

No.3 BridgeData V2

● Published by: Google·University of California, Berkeley·Stanford University

Time: 2023

Simplification

BridgeData V2 is a large and diverse robot operation behavior data set designed to promote research on scalable robot learning. The dataset is compatible with open vocabulary, multi-task learning methods conditioned on target images or natural language instructions. Skills learned from data generalize to new objects and contexts, and across institutions.

● Lower land site

https://opendatalab.com/OpenDataLab/BridgeData_V2

● 论文地址

https://arxiv.org/pdf/2308.12952.pdf

No.4 Language Table

● 发BU方:Google

Time: 2022

Simplification

Language-Table is a human-collected dataset and a multi-task continuous control benchmark for open vocabulary visual language motor learning.

● Lower land site

https://opendatalab.com/OpenDataLab/Language_Table

● 论文地址

https://arxiv.org/pdf/2210.06407.pdf

No.5 BC-Z

● Published by: Carnegie Mellon University·University of Pennsylvania·Stanford University

Time: 2020

Simplification

The authors collected a large-scale VR teleoperation demonstration dataset of 100 manipulation tasks and trained a convolutional neural network to imitate closed-loop actions observed with RGB pixels.

● Lower land site

https://opendatalab.com/OpenDataLab/BC_Z

● 论文地址

https://arxiv.org/pdf/1910.11215v2.pdf

No.6 CMU Food Manipulation(Food Playing Dataset)

● Published by: Carnegie Mellon University Robotics Institute

Time: 2021

Simplification

A diverse dataset of 21 unique food products with different slices and properties collected using a robotic arm and a series of sensors (synchronized using ROS). Through a visual embedding network, which leverages a combination of proprioceptive, audio, and visual data, the similarity between foods is encoded using a triplet loss formulation.

● Lower land site

https://opendatalab.com/OpenDataLab/CMU_Food_Manipulation

● 论文地址

https://arxiv.org/pdf/2309.14320.pdf

No.7 TOTO Benchmark

● Published by: New York University·Meta AI·Carnegie Mellon University

Time: 2022

Simplification

Train Online and Test Offline (TOTO) is an online benchmark that provides: open source operational datasets and access to shared robots for evaluation.

● Lower land site

https://opendatalab.com/OpenDataLab/TOTO_Benchmark

● 论文地址

https://arxiv.org/pdf/2306.00942.pdf

No.8 QUT Dynamic Grasping

● Published by: Queensland University of Technology

Time: 2022

Simplification

This dataset contains 812 successful trajectories related to top-down dynamic grasping of moving objects using a Franka Panda robotic arm. Objects are randomly placed on the XY motion platform, which can move objects in arbitrary trajectories at different speeds. The system was designed using the CoreXY motion platform described here. All parts in the design can be 3D printed or easily sourced.

● Lower land site

https://opendatalab.com/OpenDataLab/QUT_Dynamic_Grasping

● 论文地址

https://arxiv.org/pdf/2309.02754.pdf

No.9 Task-Agnostic Real World Robot Play

● Published by: University of Freiburg·University of Erlangen-Nuremberg

Time: 2023

Simplification

Episodes of a 7-DoF robotic arm and parallel jaw gripper performing various undirected manipulation tasks, with approximately 1% of the data annotated using natural language embeddings. The plot is collected using teleoperation via a VR controller, telling the user to operate the robot remotely without a specific task. Each state-action pair is encoded in a Numpy npz file and consists of an RGB-D image of a static and gripper camera, a proprioceptive state, and the future action of the robot corresponding to that state.

● Lower land site

https://opendatalab.com/OpenDataLab/Task_Agnostic_Real_World_Robot_Play

● 论文地址

http://tacorl.cs.uni-freiburg.de/paper/taco-rl.pdf

No.10 Roboturk

● Published by: Stanford University

Time: 2019

Simplification

The RoboTurk Real Robot Dataset collects a large dataset on three different real-world tasks: laundry room layout, tower creation, and object search. All three datasets were collected remotely by crowdsourced workers using the RoboTurk platform. Our dataset contains 2144 different demos from 54 different users. We provide the full dataset for training and smaller subsamples of the dataset for exploration.

● Lower land site

https://opendatalab.com/OpenDataLab/Roboturk

● 论文地址

https://arxiv.org/pdf/1911.04052.pdf

Due to limited space, for more open source data sets for robot learning, please visit OpenDataLab:

https://opendatalab.org.cn/

Reference: [1] https://www.zhihu.com/question/624716226

Guess you like

Origin blog.csdn.net/OpenDataLab/article/details/134399456