OpenAI has another new project, which has been open source...

Hello everyone, my name is Jack.

OpenAI has another new move, and the open source release of Shap-E.

Today, I continue to teach by hand.

Algorithm principle, environment construction, effect test, one-stop service, all in the following!

1. Shap-E effect

The function of the Shap-E algorithm, simply speaking, is to generate a corresponding 3D model based on a text description, and look at several groups of effects together.

Enter text:

A chair that looks like an avocado

(Translation: A chair that looks like an avocado.)

The corresponding 3D model output by Shap-E:

Enter text:

A spaceship

(translation: a spaceship)

The corresponding 3D model output by Shap-E:

Enter text:

An airplane that looks like a banana

(translation: a plane that looks like a banana)

More generated effects:

At present, OpenAI has open sourced the code of Shap-E.

2. Algorithm principle

Shap-E still uses the latent space diffusion model (Latent Diffusion).

Friends who are familiar with Stable Diffusion should be familiar with this concept. In fact, it is to reduce the dimensionality of some high-dimensional information into a specific feature space, and then generate it based on these features.

The overall structure of Shap-E is also a similar Encoder - Decoder structure.

However, the input and output have changed. For example, the Encoder structure of Shap-E is like this:

The input is a point cloud model, and after dimension reduction, cross-attention layer, Transformer and other structures, an implicit MLP is finally obtained.

As for the Decoder, STF Rendering is used for rendering, and the text embedding of CLIP is added.

Shap-E supports multiple modes, and the input can be text or pictures.

3. Algorithm deployment

project address:

https://github.com/openai/shap-e

Algorithm deployment is not complicated, Shap-E only depends on CLIP.

A virtual environment named shape can be created separately.

conda create -n shape python=3
conda activate shape

Then install some dependencies of CLIP.

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm

Enter the root directory of the Shap-E project, and install directly with pip.

pip install -e .

pip will install according to setup.py.

My internet speed is not very strong, and it took about an hour to build the environment locally.

shap_e/examples/sample_text_to_3d.ipynb is the code for generating 3D models from text.

shap_e/examples/sample_image_to_3d.ipynb is the code for generating 3D models from images.

On the A10 machine, it takes about 25 seconds to generate a 3D model.

Four, finally

Of course, due to the data set and other reasons, the effect of some 3D model generation is still quite poor.

For example, I tested A dogand got something like this:

I typed A catand got:

Generating a 3D model through a picture has high requirements for the picture, and it must be a picture with a white background for the effect, or simply a transparent background.

Someone also built this service on Huggingface, an unofficial project, but using official code:

https://huggingface.co/spaces/hysts/Shap-E

After I sent it out, there are probably quite a few people queuing up, so you can try it out during peak hours.

After testing it, my feeling is this:

If you are a graduate student in this direction, then this paper is worth reading, the algorithm is also worth running, some ideas can be referred to, maybe the next best paper will be yours.

But if you are a melon-eating crowd, then don’t waste time running away. The effect is not directly usable. There is no stunning effect generated by pictures, so it cannot be directly used for the production of some materials.

Well, that’s all for today, I’m Jack, see you next time~

Guess you like

Origin blog.csdn.net/c406495762/article/details/130600235