NVIDIA Omniverse Combines with GPT-4 to Generate 3D Content

The demand for 3D worlds and virtual environments is growing exponentially across industries around the world. 3D workflows are at the heart of industrial digitalization, developing real-time simulations to test and validate autonomous vehicles and robots, operating digital twins to optimize industrial manufacturing, and paving new paths for scientific discovery.

Today, 3D design and world building are still highly manual. While 2D artists and designers already have assistive tools, 3D workflows are still full of repetitive, tedious tasks.

Creating or finding objects for a scene is a time-consuming process that requires long-honed professional 3D skills such as modeling and texturing. Placing objects correctly and guiding the 3D environment art to perfection required hours of fine-tuning.

To reduce manual, repetitive tasks and help creators and designers focus on the creative and fun aspects of their work, NVIDIA has launched numerous AI projects, such as for generative AI/

artificial intelligence revolution

With ChatGPT, we are now experiencing a transformation where individuals of all skill levels can use everyday language to interact with advanced computing platforms. Large Models (LLMs) are getting more and more complex, and when a user-friendly interface like ChatGPT made them available to everyone, it became the fastest growing app in history, surpassing 100 million in just two months after launch user. Every industry now plans to harness the power of artificial intelligence for a wide range of applications, such as drug discovery, autonomous machines, and virtual assistants.

Recently, we experimented with OpenAI's viral ChatGPT and the new GPT-4 large multimodal model to show how easy it is to develop custom tools that can quickly generate 3D objects for virtual worlds in NVIDIA Omniverse . OpenAI co-founder Ilya Sutskever said in a fireside chat with NVIDIA founder and CEO Jensen Huang at GTC 2023 that GPT-4 marks "a considerable improvement in many ways" compared to ChatGPT.

By combining GPT-4 with Omniverse DeepSearch , an intelligent AI application capable of searching massive databases of unlabeled 3D assets, we were able to quickly develop a custom extension that retrieves 3D objects with simple text-based prompts, and They are automatically added to the 3D scene.

AI generated 3D content

This interesting experiment in NVIDIA Omniverse , a 3D application development platform, shows developers and technical artists how easy it is to rapidly develop custom tools that leverage generative AI to populate real-world environments. End users simply enter text-based cues to automatically generate and place high-fidelity objects, saving hours of time typically required to create complex scenes.

Objects generated from the extension are based on Universal Scene Description (USD)  SimReady assets . SimReady Assets are physically accurate 3D objects that can be used in any simulation and behave just as they would in the real world.

Get information about the 3D scene

It all starts with a scene in Omniverse. Users can easily circle an area using the pencil tool in Omniverse, enter the type of room/environment they want to generate (eg warehouse or reception room), and create the area with a single click

Create ChatGPT Tips

  ChatGPT hints consist of four parts: system input, user input examples, assistant output examples, and user hints.

Let's start with various aspects of prompts that fit the user's scenario. This includes text entered by the user as well as data in the scene.

For example, if a user wanted to create a reception room, they would specify something like "This is the room where we meet our clients. Make sure there is a comfortable set of armchairs, sofas, and a coffee table." Or, if they wanted to add a certain number of items, They could add "make sure there are at least 10 items".

This text is combined with scene information such as the size and name of the area where we will place the item as a user prompt .

“接待室,7 x10 米,原点为( 0.0 , 0.0 , 0.0 )。这是我们会见客户
的房间。确保有一套舒适的扶手椅、沙发和咖啡桌”

The idea of ​​combining the user's text with scene details is very powerful. It is much simpler to select an object in the scene and access its details programmatically than to require the user to write a prompt to describe all these details. I suspect we'll see a lot of Omniverse extensions that take advantage of this text + scene-to-scene mode.

In addition to user prompts, we need to start ChatGPT with system prompts and a training session or two.

To create predictable, deterministic results, the AI ​​specifically returns a JSON based on system prompts and examples, where all information is formatted in a clearly defined way so it can be used within Omniverse.

Here's the four-part prompt we'll be sending.

system hint

This sets constraints and instructions for the AI

You are an area generation expert. Given an area of ​​a certain size, you can generate a list of items that fit in that area and place them in the correct location.

You operate in a three-dimensional space, using an X, Y, Z coordinate system. Where X represents width, Y represents height, and Z represents depth. Coordinates (0.0, 0.0, 0.0) represent the default origin of space.

You receive from the user the name of the region, the dimensions in centimeters on the X and Z axes, the origin of the region (that is, the center point of the region).

Your answer simply generates a JSON file containing the following information:

  • area_name: area name
  • X: the coordinates of the area on the X axis
  • Y: coordinates of the area on the Y axis
  • Z: The coordinates of the area on the Z axis
  • area_size_X: the size of the area on the X axis (cm)
  • area_size_Z: the size of the area on the Z axis (cm)
  • area_objects_list: list of all objects in the area

For each item, you need to store the following information:

  • object_name: item name
  • X: The coordinates of the item on the X axis
  • Y: The coordinates of the item on the Y axis
  • Z: The coordinates of the item on the Z axis

The name of each item should include an appropriate adjective.

Remember that items should be placed within areas to create the most meaningful layout and they should not overlap. All items must lie within the bounds of the area's dimensions; never place items more than half the area's length or depth from the origin. Also note that items should be distributed throughout the area according to the area's origin, you can also use negative values ​​to display items correctly since the area's origin is always at the center of the area.

Remember, you only need to generate JSON code, nothing else. This is very important.

 Example of user input

  This is an example of what a user may submit. Note that it is a combination of scene data and text hints.

"Reception room, 7x10m, with origin at (0.0, 0.0, 0.0). This is the room where we meet clients
. Make sure to have a comfortable set of armchairs, sofas and a coffee table"

Auxiliary output example

This provides the template the AI ​​must use. Note how we describe our expected JSON.

{
    "area_name": "Reception",
    "X": 0.0,
    "Y": 0.0,
    "Z": 0.0,
    "area_size_X": 700,
    "area_size_Z": 1000,
    "area_objects_list": [
        {
            "object_name": "White_Round_Coffee_Table",
            "X": -120,
            "Y": 0.0,
            "Z": 130
        },
        {
            "object_name": "Leather_Sofa",
            "X": 250,
            "Y": 0.0,
            "Z": -90
        },
        {
            "object_name": "Comfortable_Armchair_1",
            "X": -150,
            "Y": 0.0,
            "Z": 50
        },
        {
            "object_name": "Comfortable_Armchair_2",
            "X": -150,
            "Y": 0.0,
            "Z": -50
        }  ]
}

Connect to OpenAI

The hint is sent from the extension to the AI ​​via Python code. This is very easy in the Omniverse Kit, just a few commands using the latest O  penAI Python library . Note that we are passing system input, example user input, and the example expected assistant output we just outlined to the OpenAI API. The variable "response" will contain the expected response from ChatGPT.

# Create a completion using the chatGPT model   
 response = openai.ChatCompletion.create(
         model="gpt-3.5-turbo",
         # if you have access, you can swap to model="gpt-4",
                    messages=[
                            {"role": "system", "content": system_input},
                            {"role": "user", "content": user_input},
                            {"role": "assistant", "content": assistant_input},
                            {"role": "user", "content": my_prompt},
                             ]
                    )
    # parse response and extract text
    text = response["choices"][0]["message"]['content']

Pass ChatGPT results to Omniverse DeepSearch API and generate scenes

The extension then parses the items in the ChatGPT JSON response and passes them to the Omnivere DeepSearch API. DeepSearch allows users to search 3D models stored in Omniverse Nucleus servers using natural language queries.

This means, for example, that even if we don't know the exact filename of a sofa model, we can retrieve it by searching for "Comfortable Sofa", which is exactly what we got from ChatGPT.

DeepSearch understands natural language, and by asking it "comfortable sofas," we're given a list of items our helpful AI librarians have identified as the best fit from assets selected from our current asset library. It's surprisingly good at this, so we can often use the first item it returns, but of course, we build in the select in case the user wants to select something from the list.

From there, we simply add the object to the stage.

Add items from DeepSearch to Omniverse stage

Now that DeepSearch has returned results, we just need to put the object into Omniverse. In our extension, we create a function called place_deepsearch_results() that processes all items and places them in the scene.

def place_deepsearch_results(gpt_results, query_result, root_prim_path):
        index = 0
        for item in query_result:
            # Define Prim          
            stage = omni.usd.get_context().get_stage()
            
            prim_parent_path = root_prim_path + item[‘object_name’].replace(" ", "_")
            parent_xForm = UsdGeom.Xform.Define(stage, prim_parent_path)
            
            prim_path = prim_parent_path + "/" + item[‘object_name’].replace(" ", "_")
            next_prim = stage.DefinePrim(prim_path, 'Xform')


            # Add reference to USD Asset
            references: Usd.references = next_prim.GetReferences()
            
            references.AddReference(
                assetPath="your_server://your_asset_folder" + item[‘asset_path’])


            # Add reference for future search refinement 
            config = next_prim.CreateAttribute("DeepSearch:Query", Sdf.ValueTypeNames.String)
            config.Set(item[‘object_name’])
            
           # translate prim
            next_object = gpt_results[index]
            index = index + 1
            x = next_object['X']
            y = next_object['Y']
            z = next_object['Z']

This method is used to place items, iterate over the query_result items we get from GPT, create and define new primitives using the USD API, set their transformations and attributes based on the data in gpt_results. We also save the DeepSearch query in a $property so we can use it later if we want to run DeepSearch again. Note that assetPath "your_server//your_asset_folder" is a placeholder and should be replaced with the real path of the folder where DeepSearch is performed.

Exchange items with DeepSearch

However, we may not like all the items retrieved the first time. So we built a small companion extension that allows users to browse for similar objects and swap them with a single click. With Omniverse, it's super easy to build in a modular way, so you can easily extend your workflow with additional extensions.

This companion extension is very simple. It takes an object generated by DeepSearch as a parameter and provides two buttons to get the next or previous object from the related DeepSearch query. For example, if the USD file contains the attribute "DeepSearch:Query = Modern Sofa", it will run this search again with DeepSearch and get the next best result. Of course, you can extend this to a visual UI with pictures of all search results, similar to the window we use for general DeepSearch queries. To keep this example simple, we just chose two simple buttons.

See the code below, which shows the function that increments the index, and the function replace_reference(self) that actually operates on the object swap based on the index.

def increment_prim_index():
            if self._query_results is None:
                return 


            self._index = self._index + 1


            if self._index >= len(self._query_results.paths):
                self._index = 0


            self.replace_reference()


def replace_reference(self):
        references: Usd.references = self._selected_prim.GetReferences()
        references.ClearReferences()
        references.AddReference(
                assetPath="your_server://your_asset_folder" + self._query_results.paths[self._index].uri)

Note that, as mentioned above, the path "your_server://your_asset_folder" is just a placeholder and you should replace it with the Nucleus folder where the DeepSearch query was executed.

Replace gray sofa with brown sofa using DeepSearch

This shows how by combining the power of LLM and the Omniverse API, you can create tools that enhance creativity and speed up processes.

From ChatGPT to GPT-4

One of the major advances in OpenAI's new GPT-4 is its enhanced spatial awareness in large language models.

We initially use the ChatGPT API, which is based on GPT-3.5-turbo. While GPT-4 provides good spatial awareness, it gives better results. The version you see in the video above uses GPT-4.

GPT-4 has made great improvements over GPT-3.5 in solving complex tasks and understanding complex instructions. So we can be more descriptive and use natural language when designing text prompts to "guide AI"

We can give AI very clear instructions, such as:

  • "Every object name should include an appropriate adjective."
  • "Remember that objects should be placed in this area to create the most meaningful layout possible, and they should not overlap."
  • "All objects must be within the bounds of the region size; never place objects further than 1/2 the length or 1/2 the depth of the region from the origin."
  • "Also remember that objects should be placed over the entire area relative to the area's origin, and you can also use negative values ​​to display items correctly because the area's origin is always at the center of the area."

The fact that the AI ​​correctly followed these system prompts when generating responses was particularly impressive, as the AI ​​demonstrated a good understanding of spatial awareness and how to properly place items. One of the challenges of performing this task with GPT-3.5 is that sometimes objects spawn outside the room or in odd positions.

Not only does GPT-4 place objects within the correct boundaries of a room, but it also places objects logically: a bedside table will actually appear to the side of a bed, a coffee table will be placed between two couches, and so on.

Build your own ChatGPT powered extension

While this is just a small demonstration of what AI can do when connected to a 3D space, we believe it will open the door to a variety of tools beyond scene building. Developers can build AI-powered extensions within Omniverse for lighting, cameras, animations, character dialogue, and other elements that optimize a creator's workflow. They can even develop tools to attach physics to scenes and run entire simulations.

おすすめ

転載: blog.csdn.net/qq_41929396/article/details/132317009