myCobot Robot ChatGPT Application: Design Principles and Model Capabilities

We extend the capabilities of ChatGPT to robots and intuitively control multiple platforms such as robotic arms, drones, and home assistant robots through language.

Have you ever wanted to tell a robot what to do in your own words, just like you would a human? Wouldn't it be amazing to just tell your home assistant robot: "Please heat up my lunch" and let it find the microwave by itself? Although language is the most intuitive way for us to express our intentions, we still rely heavily on handwritten code to control robots. Our team has been exploring how to change this reality and enable natural human-computer interaction using ChatGPT, OpenAI's new AI language model .

ChatGPT is a language model trained on a large corpus of text and human interactions, enabling it to generate coherent and grammatically correct responses to a variety of prompts and questions. Our goal in this research is to see if ChatGPT can think beyond text and reason about the physical world to help with robotic tasks. We want to help people interact with robots more easily without having to learn complex programming languages ​​or detailed information about robotic systems. The key challenge here is to teach ChatGPT how to solve problems that take into account the laws of physics, the context of the operating environment, and how the robot's physical actions change the state of the world.

It turns out that ChatGPT can do a lot on its own, but it still needs some help. Our technical paper describes a set of design principles that can be used to guide language models to solve robotic tasks. These include, but are not limited to, special prompt structures, high-level APIs, and human feedback via text. We believe that our work is just the beginning of a shift in the way we develop robotic systems, and we hope to inspire other researchers to enter this exciting field. Read on for more technical details about our approach and ideas.

The Challenges of Robotics Today, and How ChatGPT Can Help

The current robotics pipeline begins with an engineer or technical user who needs to translate the requirements of a task into system code. Engineers sit in the loop, which means they need to write new code and specifications to correct the robot's behavior. Overall, the process is slow (users need to write low-level code), expensive (requires highly skilled users with deep robotics knowledge), and inefficient (requires multiple interactions to get things working).

ChatGPT unlocks a new paradigm in bots and allows (potentially non-technical) users to sit in the loop, providing high-level feedback to large language models (LLMs) while monitoring bot performance. By following our design principles, ChatGPT can generate code for robotic scenarios. Without any fine-tuning, we exploit the knowledge of the LLM to control different robot shapes for various tasks. In our work, we show multiple examples of ChatGPT solving difficult robotic problems, as well as complex robotic deployments in the domains of manipulation, aerial, and navigation.

Robotics Using ChatGPT: Design Principles

Producing an LLM is a highly empirical science. Through trial and error, we constructed a methodology and set of design principles for programming cues for robotic tasks:

1. First, we define a set of high-level robotics APIs or libraries. This library can be specific to a particular robot and should map to an existing low-level implementation in the robot's control stack or perception library. It is important to use descriptive names for high-level APIs so that ChatGPT can reason about their behavior;

2. Next, we write a text prompt for ChatGPT that describes the task objective while explicitly stating which functions in the high-level library are available. Hints can also contain information about task constraints,

3. Or how ChatGPT should form its answers (specific coding language, using auxiliary parsing elements); users evaluate ChatGPT's code output by direct inspection or using a simulator. Users provide ChatGPT with feedback on answer quality and safety using natural language, if desired.

4. When the user is satisfied with the solution, the final code can be deployed to the robot.

Enough of the theory... What exactly can ChatGPT do?

Let's look at an example...you can find more case studies in our code repository.

(Here we only give an interesting example, there are other examples of robots in the original text)

Loop users: when complex tasks require conversation

Next, we used ChatGPT in the manipulation scenario of a robot arm. We use dialogue feedback to teach the model how to compose the initially provided APIs into more complex high-level functions: ChatGPT codes itself. Using a curriculum-based strategy, the model is able to logically link these learned skills together to perform actions such as stacking blocks.

Additionally, the model shows an interesting example of bridging textual and physical domains when building the Microsoft logo out of wooden blocks. Not only is it able to call up the logo from its internal knowledge base, but it is also able to "draw" the logo (as SVG code) and then use the skills learned above to determine which existing robot actions can make up its physical form.

The following is a conversation with chatGPT about how to control a robotic arm to make an SVG file with the colors of the Microsoft logo.

user:

Imagine we are working with a manipulator robot. This is a 6-DOF robotic arm with a suction pump attached to its end effector. I hope you can assist me in sending commands to this robot, given a scene and a task.

At any time, you can access the following functions:

grab(): Turn on the suction pump to grab the object

release(): turn off the suction pump to release the object

get_position(object): Given a string with an object name, returns the coordinates and orientation of the vacuum pump touching the top of the object [X, Y, Z, Y, yaw, pitch, roll]

move_to(position): It moves the suction pump to a given position [X, Y, Z, yaw, pitch, roll].

You can use these functions to create new functions, but you are not allowed to use any other hypothetical functions.

Keep the solution simple and straightforward. Positions are in millimeters and angles are in degrees. You can also ask clarification questions using the tag "Questions - ". Below is an example scenario that demonstrates how to ask clarifying questions.

Suppose a scene contains two spheres.

chat:

Problem - There are two fields. Which one do you want me to take?

user:

Sphere 1, please. Use python code to express your solution. Are you ready?

chat:

Yes, I am ready. Please describe the scenario and task.

user:

The scene has the following objects: white pad, box, blue square, yellow square, green square, red square, brown square 1, brown square 2. The blocks are cubes with a height of 40 mm inside boxes with a depth of 80 mm. These blocks can only be reached from the top of the box. I want you to learn the art of picking up an object and holding it. To do this, you need to move a safe distance (100mm) above the object, reach the object, grab it and lift it.

chat:

understand. Here's a Python function that takes the name of an object, moves the robotic arm over the object, grabs it, and then moves the object a safe distance of 100mm above the object. Note that this function assumes that the robot arm is initially in a safe starting position.

import timedef pick_up_object(object_name):# get the position of the objectobject_pos = get_position(object_name)# move the robot arm to a safe position above the objectsafe_pos = [object_pos[0], object_pos[1], object_pos[2]+100, object_pos[3], object_pos[4], object_pos[5]]move_to(safe_pos)# move the robot arm to the object's positionmove_to(object_pos)# grab the objectgrab()time.sleep(1) # wait for the object to be firmly attached# move the robot arm back to the safe position above the objectmove_to(safe_pos)

copy

To use this function, simply call pick_up_object and provide the name of the object to be picked as a string, eg pick_up_object("red block").

user:

great! Now I want you to learn how to place and counter in one position. Considering you're holding a generic object, you first need to move to a safe location above the target location, drop the object, and then release it.

chat:

understand. Here's a Python function that takes the position where an object should be placed, moves the robotic arm over that position, sets the object down, and releases it. Note that this function assumes that the robot arm is holding an object.

def place_object(position):# move the robot arm to a safe position above the target locationsafe_pos = [position[0], position[1], position[2]+100, position[3], position[4], position[5]]move_to(safe_pos)# move the robot arm to the target locationmove_to(position)# release the objectrelease()# move the robot arm back to the safe position above the target locationmove_to(safe_pos)

copy

To use this function, simply call place_object and provide the destination location as a list of 6 numbers, eg place_object([100, 200, 50, 0, 0, 0]). Note that the unit of position is in millimeters and the orientation is in degrees.

user:

great! Now I want you to use these skills to pick up the green blocks and place them on the white mat. All blocks have a height of 40 mm.

chat:

Sure! Here's the Python code to pick up the green block and place it on the white pad:

# get the positions of the green block and white padgreen_pos = get_position("green block")white_pos = get_position("white pad")# pick up the green blockpick_up_object("green block")# calculate the position to place the block on the white padplace_pos = [white_pos[0], white_pos[1], white_pos[2]+40, 0, 0, 0]# place the block on the white padplace_object(place_pos)

copy

This code uses the get_position function to get the position of the green block and the white pad, picks the green block using the pick_up_object function we defined earlier, calculates where to place the block on the white pad, and places the block on the white pad using the place_object function we defined earlier. Note that we add the height of the block (40 mm) to the height of the white pad to ensure that the block is placed on top of the pad.

user:

great! Now I want you to stack the two brown blocks on top of the green block.

chat:

Sure! Here is the Python code to stack two brown blocks on top of the green block:

# get the positions of the green block and both brown blocksgreen_pos = get_position("green block")brown1_pos = get_position("brown block 1")brown2_pos = get_position("brown block 2")# pick up the first brown blockpick_up_object("brown block 1")# calculate the position to place the block on top of the green blockplace_pos = [green_pos[0], green_pos[1], green_pos[2]+40, 0, 0, 0]# place the first brown block on top of the green blockplace_object(place_pos)# pick up the second brown blockpick_up_object("brown block 2")# calculate the position to place the block on top of the first brown blockplace_pos = [green_pos[0], green_pos[1], green_pos[2]+80, 0, 0, 0]# place the second brown block on top of the first brown blockplace_object(place_pos) 

copy

This code gets the position of the green block and the two brown blocks using the get_position function, picks up the first brown block using the pick_up_object function we defined earlier, calculates where to place the first brown block on top of the green block, and uses the pick_up_object function we defined earlier The place_object function places the first brown block on top of the green block. It then picks up the second brown block, calculates where to place the second brown block on top of the first brown block, and places the second brown block on the first brown block using the place_object function we defined earlier the top of. Note that we add the height of the green block (40 mm) to the height of the previous block to ensure that this block is placed on top of the previous block.

user:

Great! I now want to use these blocks to build Microsoft's logo on a white pad. It consists of four colors forming a square with blue in the lower left corner, yellow in the lower right corner, red in the upper left corner, and green in the upper right corner.

chat:

Sure! Here's the Python code to build the Microsoft logo on a white pad using blocks:

# get the positions of the blocks and the white padwhite_pad_pos = get_position("white pad")blue_pos = get_position("blue block")yellow_pos = get_position("yellow block")red_pos = get_position("red block")green_pos = get_position("green block")# pick up the blue blockpick_up_object("blue block")# calculate the position to place the blue blockplace_pos = [white_pad_pos[0]-20, white_pad_pos[1]-20, white_pad_pos[2]+40, 0, 0, 0]# place the blue block on the white padplace_object(place_pos)# pick up the yellow blockpick_up_object("yellow block")# calculate the position to place the yellow blockplace_pos = [white_pad_pos[0]+20, white_pad_pos[1]-20, white_pad_pos[2]+40, 0, 0, 0]# place the yellow block on the white padplace_object(place_pos)# pick up the red blockpick_up_object("red block")# calculate the position to place the red blockplace_pos = [white_pad_pos[0]-20, white_pad_pos[1]+20, white_pad_pos[2]+40, 0, 0, 0]# place the red block on the white padplace_object(place_pos)# pick up the green blockpick_up_object("green block")# calculate the position to place the green blockplace_pos = [white_pad_pos[0]+20, white_pad_pos[1]+20, white_pad_pos[2]+40, 0, 0, 0]# place the green block on the white padplace_object(place_pos)

This code uses the get_position function to get the position of the white mat and all four blocks, uses the pick_up_object function we defined earlier to pick up each block in turn, calculates where to place the block on the white mat, and uses the place_object function we defined earlier to place the block on a white pad. We use block colors to make sure the logo is properly built. We place the blue block at the bottom left position, the yellow block at the bottom right position, the red block at the top left position, and the green block at the top right position, as specified in the prompt.

Taking robotics out of the lab and into the world

We are excited to publish these technologies with the aim of bringing robotics to a wider audience. We believe that language-based robot control will be fundamental to bringing robotics out of the science lab and into the hands of everyday users.

That said, we do emphasize that the output of ChatGPT is not meant to be directly deployed on robots without careful analysis. Users are encouraged to take advantage of the power of simulation to evaluate these algorithms prior to potential real-world deployment, always taking necessary safety precautions. Our work represents only a small part of the larger intersection of language models in robotics, and we hope to inspire much of the future work.

Guess you like

Origin blog.csdn.net/m0_71627844/article/details/131517207