The main source of the most recent technological advances we see today in many subfields of machine learning is the transfer of knowledge that occurs from large task-independent datasets to expressive models capable of efficiently absorb all this data. This ability has been remarkably demonstrated previously in areas such as computer vision, natural language processing, and speech recognition. However, its application remains undetermined in robotics. One of the major contributors to this limitation is the lack of extensive and diverse robotic data, which limits a model’s ability to account for a wide range of robotic experiences. Additionally, another concern is the lack of scalable models and their ability to generalize learning from such large datasets.
Google AI researchers worked in this direction and explained that a combination of task-independent open training and a high-capacity architecture capable of taking into account all the different robotic data is the key to the models’ success. general robotics. To test their hypotheses, a team of Google AI researchers created Robotics Transformer 1 (RT-1), a multitasking model that symbolizes robot input and output actions to facilitate efficient runtime inference and enable real-time control. This model was developed using a real-world robotic dataset of over 130,000 episodes that were gathered using 13 robots from Everyday Robots (EDR) over an extended period of time.
The main distinguishing features of RT-1 are image tokenization, action tokenization and token compression. The transformer architecture underlying the design of RT-1 allows it to efficiently generate symbolic actions from its inputs, which include a brief history of images taken by the robot’s camera and task descriptions written in language. natural. Input images are run through a model that is pre-trained on ImageNet during the image tokenization step, and the output is then flattened accordingly. The image tokenizer then uses FiLM layers to extract the image features needed for the task at hand. In order to learn with the TokenLearner attention module, the model adaptively chooses soft combinations of compressible image tokens. This is what results in an acceleration of the inference.
The researchers highlighted the need for a large and diverse dataset of robot trajectories in order to develop such a system that could generalize to new tasks and demonstrate its robustness to various distractors and backgrounds. The researchers used 13 EDR robots to collect 130,000 episodes over 17 months to create such a dataset. The data set includes activities such as selecting and arranging objects, opening and closing drawers, knocking over objects, etc. Additionally, they added a written description of the robot’s action as an annotation for each episode.
The team evaluated the generalization capabilities and performance of RT-1 against three benchmark models in four categories: performance on known tasks, performance on unseen tasks, robustness, and long-term scenarios. In all four domains, RT-1 performs significantly better than baselines, showing significantly superior zero-hit generalization to new tasks, environments, and objects. They also took an in-depth look at the effects of tokenization, stock representation, dataset composition, and many other design decisions that went into the model and training set.
In a nutshell, the RT-1 robotic transformer is a simple and scalable action generation model suitable for real-world robotic tasks. For future work, researchers will focus on scaling the robot’s skill set faster by creating techniques that allow even beginners to train the robot through guided data collection and prompting. model. They predict that scalable attention and memory will improve response times and retention capacity of robotic processors. Google has also opened up the RT-1 code in hopes that it will prove to be a useful tool for future research into scaling robot learning. Project website and other details can be viewed here.
Check Paper and Blog. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing and web development. She likes to learn more about the technical field by participating in several challenges.