Hello everyone, I am Wei Xue AI. Today I will introduce to you a large collection of 124 artificial intelligence tasks. The task collection mainly includes 4 categories: natural language processing (NLP), computer vision (CV), speech recognition, and multimodal tasks. .
I have compiled a large collection of 124 application scenario tasks here, and each task directory is as follows:
- Sentence Embedding: Maps a sentence to a fixed-dimensional vector representation.
- Text Ranking: Ranking a set of texts to determine their relevance to a given query.
- Word Segmentation: The process of dividing continuous text into words or chunks.
- Part-of-Speech: Mark each word in the sentence with its corresponding part of speech.
- Token Classification: Classify each token in an input text sequence into a predefined category.
- Named Entity Recognition (Named Entity Recognition): Identify named entities with specific meaning in text, such as names, places, organizations, etc.
- Relation Extraction: Extract the relationship or connection between entities from the text.
- Information Extraction: Extract structured information, such as entities, relationships, and attributes, from unstructured text.
- Sentence Similarity: Measures the semantic similarity or relatedness between two sentences.
- Text Translation: The process of converting text in one language into another.
- Natural Language Inference (NLI: Natural Language Inference): Judging the logical relationship between given premises and assumptions, including implication, contradiction and neutrality.
- Sentiment Classification: Classify text into sentiment categories such as positive, negative, or neutral.
- Portrait Matting: Accurately separate the subject and the background from the image.
- Universal Matting: Accurately separate the target object from the background from the image, not limited to portraits.
- Human Detection: Detects the location of a human body in an image or video.
- Image Object Detection: Detect and localize multiple target objects in an image.
- Image Denoising: Reduces the noise level in an image and improves image quality.
- Image Deblurring: Restores clarity and detail to blurred images.
- Video Stabilization: Shake correction is performed on the video to make it stable and smooth.
- Video Super-Resolution: Increases the resolution of a video by increasing its pixel-level detail.
- Text Classification: Classify text into predefined categories or labels.
- Text Generation: The process of generating continuous text from a given input.
- Zero-Shot Classification: Classify data into categories that the model has never seen during the training phase.
- Task-Oriented Conversation: Conduct conversations and question-and-answers related to specific tasks.
- Dialog State Tracking: Track user intent and system state changes across multiple rounds of dialog.
- Table Question Answering: Answer relevant questions based on tabular data.
- Document-Grounded Dialog Generation: Generate relevant dialogue responses based on document content.
- Document-Grounded Dialog Rerank: Ranks the generated dialog replies to select the best one.
- Document-Grounded Dialog Retrieval: Retrieve the best dialog related to a document from candidate dialogs.
- Text Error Correction: Automatically corrects spelling or grammatical errors in text.
- Image Captioning: Generate descriptive text for images based on image content.
- Video Captioning (Video Captioning): Generate descriptive text for the video based on the video content.
- Image Portrait Stylization: Apply an artistic style transfer to the human subject in the image.
- Optical Character Recognition (OCR Detection): Detect and recognize text from images.
- Table Recognition: Automatically recognize table structure and content from images.
- Lineless Table Recognition: Automatically identify table structure and content from lineless table images.
- Document-VL Embedding: A vector representation that maps a document to a visual-semantic space.
- License Plate Detection: Detect and localize the license plate area of a vehicle in an image.
- Fill-Mask: Fills a given mask based on context and partial information.
- Feature Extraction: Extract meaningful feature representations from input data.
- Action Recognition: Recognize actions or behaviors in videos.
- Action Detection: Detect and locate specific actions or behaviors in videos.
- Live Category: Classify live videos, such as sports, news, games, etc.
- Video Category (Video Category): Classify videos, such as movies, music, sports, etc.
- Multi-Modal Embedding: Map data of multiple different modalities into a shared vector space.
- Generative Multi-Modal Embedding: Maps multimodal data to vector representations and is able to generate data related to them.
- Multi-Modal Similarity: Measures the similarity or correlation between multimodal data such as images and text.
- Visual Question Answering: Answers relevant questions given images and questions.
- Video Question Answering: Answers relevant questions based on a given video and question.
- Video Embedding: Maps a video sequence to a fixed-dimensional vector representation.
- Text-to-Image Synthesis (Text-to-Image Synthesis): Synthesize the corresponding image according to the given text description.
- Text-to-Video Synthesis: Synthesize the corresponding video according to the given text description.
- Body 2D Keypoints: Detect and track body keypoints in images.
- Body 3D Keypoints: Detect and track body keypoints in 3D space.
- Hand 2D Keypoints: Detect and track hand keypoints in images.
- Card Detection: Detect and locate specific types of cards in an image.
- Content Check: Checks for inappropriate, sensitive, or illegal content in text or images.
- Face Detection: Detects the location of a face in an image or video.
- Face Liveness: Determine whether the face in the image or video is a real live body, not a photo or video.
- Face Recognition: Identifying faces in images or videos and matching them with known identities.
- Facial Expression Recognition: Recognize the expression state of human faces in images or videos, such as happiness, sadness, anger, etc.
- Face Attribute Recognition (Face Attribute Recognition): Identify the attributes of faces in images or videos, such as age, gender, race, etc.
- Face 2D Keypoints: Detect and track facial keypoints in images.
- Face Quality Assessment: Evaluates the quality of face images in images or videos.
- Video Multi-Modal Embedding: Map multi-modal data (such as images and text) into a shared vector space.
- Image Color Enhancement: Enhance the color saturation, contrast and brightness of the image.
- Virtual Try-On: Through computer-generated technology, virtual clothing is applied to real human body images to achieve online try-on effects.
- Image Colorization: The process of restoring a grayscale image to a color image.
- Video Colorization: The process of restoring black and white video to color video.
- Image Segmentation: Divide an image into multiple distinct regions or objects.
- Image Driving Perception: Using computer vision technology to extract driving-related information in images, such as lane lines, traffic signs, etc.
- Image Depth Estimation: Estimate the depth or distance of objects in a scene based on monocular or binocular images.
- Indoor Layout Estimation: Estimate the layout structure of a room based on indoor images.
- Video Depth Estimation: Estimate the depth or distance of objects in the scene based on the inter-frame information in the video.
- Panorama Depth Estimation: Estimating the depth or distance of objects in a scene in a panoramic image.
- Image Style Transfer: Applying the style of one image to another image to generate an image with a new style.
- Face Image Generation: Generate realistic facial images, which can be used for face data enhancement, data generation and other applications.
- Image Super-Resolution: Improves the resolution of an image by increasing its pixel-level details.
- Image Deblocking: Reduces blocking artifacts or streak noise in images caused by compression.
- Image Portrait Enhancement: Improve the appearance, skin color and other characteristics of the human subject in the image.
- Product Retrieval Embedding: Map items to vector representations to support item relevance retrieval.
- Image-to-Image Generation: Generates a corresponding output image from a given input image.
- Image Classification: Classify images into predefined categories or labels.
- Optical Character Recognition (OCR Recognition): Detect and recognize printed or handwritten text from images.
- Skin Retouching: beautify the face image, remove skin blemishes, smooth skin, etc.
- Frequently Asked Questions (FAQ Question Answering): Answer users' questions based on frequently asked questions.
- Crowd Counting: Estimate the number of people based on crowd density in an image or video.
- Video Single Object Tracking: Tracks a single target object in a video sequence.
- Image ReID (Image ReID - Person): Re-identify the identity based on the appearance characteristics of the person in the image.
- Text-Driven Segmentation: Segment objects in images or videos based on a given textual description.
- Movie Scene Segmentation: Divide a movie or video into different scenes, each scene representing an independent plot or event.
- Shop Segmentation: Segment objects or areas in the store from images or videos for product display, intelligent monitoring and other applications.
- Image Inpainting: According to the existing image content, fill in the missing or damaged parts and restore the integrity of the original image.
- Image Paint-By-Example: Based on a given example image, modify other images to have a similar painting style or effect.
- Controllable Image Generation: By controlling input parameters or vectors, images with specific attributes, styles, or characteristics are generated.
- Video Inpainting: According to the existing video content, fill in the missing or damaged frames or areas, and restore the integrity of the original video.
- Video Human Matting: Separate the characters in the video from the background for subsequent editing or special effects processing.
- Human Reconstruction: Based on a given image, video or sensor data, reconstruct the 3D model or pose information of the human body.
- Video Frame Interpolation (Video Frame Interpolation): Generate frames between given two video frames to increase the frame rate or smooth transition of the video.
- Video Deinterlace (Video Deinterlace): Convert interlaced video to progressive scanning to improve the quality and fluency of video playback.
- Human Wholebody Keypoint Detection: Detect and locate key points of the human body in images or videos, such as head, hands, feet, etc.
- Hand Static: Recognize static gestures in images or videos by analyzing information such as palm shape and finger posture.
- Face-Human-Hand Detection: Detect and locate human face, human and hand regions in images or videos.
- Face Emotion Analysis (Face Emotion): By analyzing facial expressions, judge the emotional state expressed by the face in the image or video.
- Product Segmentation: Segment commodities or products in images or videos from the background for applications such as product recognition and advertisement recommendation.
- Referring Video Object Segmentation: Segment objects in images or videos based on a given reference image or video.
- Video Summarization (Video Summarization): According to the content and characteristics of the video, a summary or overview of the video is generated to provide convenience for video browsing and retrieval.
- Image Sky Change: Replace the sky part in the image with different sky backgrounds to change the atmosphere and environment of the image.
- Translation Evaluation: Based on a given translation result, evaluate its quality, accuracy, and consistency with the original text.
- Video Object Segmentation: Segment objects in the video from the background for subsequent editing or special effects processing.
- Video Multi-Object Tracking (Video Multi-Object Tracking): Simultaneously track multiple moving targets in the video, locate and track the position of the target in real time.
- Multi-View Depth Estimation: Estimate the three-dimensional depth information of objects in the scene through multiple views or images.
- Few-Shot Detection: In the case of only a small number of labeled samples, the target detection task is performed to improve the generalization ability of the model.
- Body Reshaping: According to the human body area in the image or video, adjust the shape, posture or proportion of the human body to change the appearance of the human body.
- Face Fusion: Fusing one person's facial features or expressions onto another person's avatar to generate a composite image with characteristics of both.
- Image Matching: In an image library or database, find the most similar or matching image to a given image.
- Image Quality Assessment - Subjective Scoring (Image Quality Assessment - MOS): Through the method of subjective scoring, the quality of the image is evaluated, which reflects the perception of the image by the human eye.
- Image Quality Assessment - Degradation (Image Quality Assessment - Degradation): Through the method of objective measurement, evaluate the quality of the image under different transformation or compression conditions.
- Vision Efficient Tuning: Quickly tune and optimize vision models and algorithms through automated methods to improve computing efficiency and accuracy.
- 3D object detection (Object Detection 3D): In 3D space, detect and locate the position, size and posture of the target object.
- Bad Image Detection: Identify and detect bad or low-quality images such as noise, blur, and distortion in the image.
- Nerf Reconstruction Accuracy Evaluation (NeRF Reconstruction Accuracy): Evaluates the accuracy and quality of the Neural Radiation Field (NeRF) model in building 3D scene reconstructions.
- Siamese UIE: Siamese networks are used for UIE tasks, the related problems of input user interface element recognition or generation.
- Mathematical formula recognition (LatexOCR): latex recognition of mathematical formulas in pictures.