Artificial Intelligence Depth Estimation Technology

Artificially retarded (can) walk! ! !

Here are the basic operations:

Find the Depth Estimation model on the Hugging Face web page, as shown below:

Hugging Face – The AI community building the future.

 (If you go to Hugging Face, you have to climb over the wall! I don’t care whether you climb over the wall or not...)

The following is excerpted from Hugging Face:

Monocular depth estimation

Monocular depth estimation is a computer vision task that involves predicting the depth information of a scene from a single image. In other words, it is the process of estimating the distance of objects in a scene from a single camera viewpoint.

Monocular depth estimation has various applications, including 3D reconstruction, augmented reality, autonomous driving, and robotics. It is a challenging task as it requires the model to understand the complex relationships between objects in the scene and the corresponding depth information, which can be affected by factors such as lighting conditions, occlusion, and texture.

The task illustrated in this tutorial is supported by the following model architectures:

DPTGLPN

In this guide you’ll learn how to:

  • create a depth estimation pipeline
  • run depth estimation inference by hand

Before you begin, make sure you have all the necessary libraries installed:

pip install -q transformers

Depth estimation pipeline

The simplest way to try out inference with a model supporting depth estimation is to use the corresponding pipeline(). Instantiate a pipeline from a checkpoint on the Hugging Face Hub:

from transformers import pipeline

checkpoint = "vinvino02/glpn-nyu"
depth_estimator = pipeline("depth-estimation", model=checkpoint)

 Next, choose an image to analyze:

from PIL import Image
import requests

url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
image = Image.open(requests.get(url, stream=True).raw)
image

Photo of a busy street

 Pass the image to the pipeline.

predictions = depth_estimator(image)

The pipeline returns a dictionary with two entries. The first one, called predicted_depth, is a tensor with the values being the depth expressed in meters for each pixel. The second one, depth, is a PIL image that visualizes the depth estimation result.

Let’s take a look at the visualized result:

predictions["depth"]

Depth estimation visualization

Depth estimation inference by hand

Now that you’ve seen how to use the depth estimation pipeline, let’s see how we can replicate the same result by hand.

Start by loading the model and associated processor from a checkpoint on the Hugging Face Hub. Here we’ll use the same checkpoint as before:

from transformers import AutoImageProcessor, AutoModelForDepthEstimation

checkpoint = "vinvino02/glpn-nyu"

image_processor = AutoImageProcessor.from_pretrained(checkpoint)
model = AutoModelForDepthEstimation.from_pretrained(checkpoint)

Prepare the image input for the model using the image_processor that will take care of the necessary image transformations such as resizing and normalization:

pixel_values = image_processor(image, return_tensors="pt").pixel_values

Pass the prepared inputs through the model:

import torch

with torch.no_grad():
    outputs = model(pixel_values)
    predicted_depth = outputs.predicted_depth

Visualize the results:

import numpy as np

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
).squeeze()
output = prediction.numpy()

formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)
depth

Statement: The Chinese version of this article can be read in the article of the same name by CSDN blogger "XBL0430".

           Regarding copyright issues, the other party’s article has been published with my consent.

           Article link: Artificial Intelligence Depth Estimation Technology (Chinese Translation Version)_XBL0430’s Blog-CSDN Blog

           (By the way, they put a lot of effort into translating my article. Please give me your support!)


The content of this article is compiled by the editor myself. There may be errors or omissions in the content. Thank you for your suggestions!

Remember to like and follow~

Guess you like

Origin blog.csdn.net/zyl_coder/article/details/132429177