[A large collection of 124 artificial intelligence tasks] - a collection of tasks such as natural language processing (NLP), computer vision (CV), speech recognition, and multimodality

Hello everyone, I am Wei Xue AI. Today I will introduce to you a large collection of 124 artificial intelligence tasks. The task collection mainly includes 4 categories: natural language processing (NLP), computer vision (CV), speech recognition, and multimodal tasks. .
insert image description here

I have compiled a large collection of 124 application scenario tasks here, and each task directory is as follows:

  1. Sentence Embedding: Maps a sentence to a fixed-dimensional vector representation.
  2. Text Ranking: Ranking a set of texts to determine their relevance to a given query.
  3. Word Segmentation: The process of dividing continuous text into words or chunks.
  4. Part-of-Speech: Mark each word in the sentence with its corresponding part of speech.
  5. Token Classification: Classify each token in an input text sequence into a predefined category.
  6. Named Entity Recognition (Named Entity Recognition): Identify named entities with specific meaning in text, such as names, places, organizations, etc.
  7. Relation Extraction: Extract the relationship or connection between entities from the text.
  8. Information Extraction: Extract structured information, such as entities, relationships, and attributes, from unstructured text.
  9. Sentence Similarity: Measures the semantic similarity or relatedness between two sentences.
  10. Text Translation: The process of converting text in one language into another.
  11. Natural Language Inference (NLI: Natural Language Inference): Judging the logical relationship between given premises and assumptions, including implication, contradiction and neutrality.
  12. Sentiment Classification: Classify text into sentiment categories such as positive, negative, or neutral.
  13. Portrait Matting: Accurately separate the subject and the background from the image.
  14. Universal Matting: Accurately separate the target object from the background from the image, not limited to portraits.
  15. Human Detection: Detects the location of a human body in an image or video.
  16. Image Object Detection: Detect and localize multiple target objects in an image.
  17. Image Denoising: Reduces the noise level in an image and improves image quality.
  18. Image Deblurring: Restores clarity and detail to blurred images.
  19. Video Stabilization: Shake correction is performed on the video to make it stable and smooth.
  20. Video Super-Resolution: Increases the resolution of a video by increasing its pixel-level detail.
  21. Text Classification: Classify text into predefined categories or labels.
  22. Text Generation: The process of generating continuous text from a given input.
  23. Zero-Shot Classification: Classify data into categories that the model has never seen during the training phase.
  24. Task-Oriented Conversation: Conduct conversations and question-and-answers related to specific tasks.
  25. Dialog State Tracking: Track user intent and system state changes across multiple rounds of dialog.
  26. Table Question Answering: Answer relevant questions based on tabular data.
  27. Document-Grounded Dialog Generation: Generate relevant dialogue responses based on document content.
  28. Document-Grounded Dialog Rerank: Ranks the generated dialog replies to select the best one.
  29. Document-Grounded Dialog Retrieval: Retrieve the best dialog related to a document from candidate dialogs.
  30. Text Error Correction: Automatically corrects spelling or grammatical errors in text.
  31. Image Captioning: Generate descriptive text for images based on image content.
  32. Video Captioning (Video Captioning): Generate descriptive text for the video based on the video content.
  33. Image Portrait Stylization: Apply an artistic style transfer to the human subject in the image.
  34. Optical Character Recognition (OCR Detection): Detect and recognize text from images.
  35. Table Recognition: Automatically recognize table structure and content from images.
  36. Lineless Table Recognition: Automatically identify table structure and content from lineless table images.
  37. Document-VL Embedding: A vector representation that maps a document to a visual-semantic space.
  38. License Plate Detection: Detect and localize the license plate area of ​​a vehicle in an image.
  39. Fill-Mask: Fills a given mask based on context and partial information.
  40. Feature Extraction: Extract meaningful feature representations from input data.
  41. Action Recognition: Recognize actions or behaviors in videos.
  42. Action Detection: Detect and locate specific actions or behaviors in videos.
  43. Live Category: Classify live videos, such as sports, news, games, etc.
  44. Video Category (Video Category): Classify videos, such as movies, music, sports, etc.
  45. Multi-Modal Embedding: Map data of multiple different modalities into a shared vector space.
  46. Generative Multi-Modal Embedding: Maps multimodal data to vector representations and is able to generate data related to them.
  47. Multi-Modal Similarity: Measures the similarity or correlation between multimodal data such as images and text.
  48. Visual Question Answering: Answers relevant questions given images and questions.
  49. Video Question Answering: Answers relevant questions based on a given video and question.
  50. Video Embedding: Maps a video sequence to a fixed-dimensional vector representation.
  51. Text-to-Image Synthesis (Text-to-Image Synthesis): Synthesize the corresponding image according to the given text description.
  52. Text-to-Video Synthesis: Synthesize the corresponding video according to the given text description.
  53. Body 2D Keypoints: Detect and track body keypoints in images.
  54. Body 3D Keypoints: Detect and track body keypoints in 3D space.
  55. Hand 2D Keypoints: Detect and track hand keypoints in images.
  56. Card Detection: Detect and locate specific types of cards in an image.
  57. Content Check: Checks for inappropriate, sensitive, or illegal content in text or images.
  58. Face Detection: Detects the location of a face in an image or video.
  59. Face Liveness: Determine whether the face in the image or video is a real live body, not a photo or video.
  60. Face Recognition: Identifying faces in images or videos and matching them with known identities.
  61. Facial Expression Recognition: Recognize the expression state of human faces in images or videos, such as happiness, sadness, anger, etc.
  62. Face Attribute Recognition (Face Attribute Recognition): Identify the attributes of faces in images or videos, such as age, gender, race, etc.
  63. Face 2D Keypoints: Detect and track facial keypoints in images.
  64. Face Quality Assessment: Evaluates the quality of face images in images or videos.
  65. Video Multi-Modal Embedding: Map multi-modal data (such as images and text) into a shared vector space.
  66. Image Color Enhancement: Enhance the color saturation, contrast and brightness of the image.
  67. Virtual Try-On: Through computer-generated technology, virtual clothing is applied to real human body images to achieve online try-on effects.
  68. Image Colorization: The process of restoring a grayscale image to a color image.
  69. Video Colorization: The process of restoring black and white video to color video.
  70. Image Segmentation: Divide an image into multiple distinct regions or objects.
  71. Image Driving Perception: Using computer vision technology to extract driving-related information in images, such as lane lines, traffic signs, etc.
  72. Image Depth Estimation: Estimate the depth or distance of objects in a scene based on monocular or binocular images.
  73. Indoor Layout Estimation: Estimate the layout structure of a room based on indoor images.
  74. Video Depth Estimation: Estimate the depth or distance of objects in the scene based on the inter-frame information in the video.
  75. Panorama Depth Estimation: Estimating the depth or distance of objects in a scene in a panoramic image.
  76. Image Style Transfer: Applying the style of one image to another image to generate an image with a new style.
  77. Face Image Generation: Generate realistic facial images, which can be used for face data enhancement, data generation and other applications.
  78. Image Super-Resolution: Improves the resolution of an image by increasing its pixel-level details.
  79. Image Deblocking: Reduces blocking artifacts or streak noise in images caused by compression.
  80. Image Portrait Enhancement: Improve the appearance, skin color and other characteristics of the human subject in the image.
  81. Product Retrieval Embedding: Map items to vector representations to support item relevance retrieval.
  82. Image-to-Image Generation: Generates a corresponding output image from a given input image.
  83. Image Classification: Classify images into predefined categories or labels.
  84. Optical Character Recognition (OCR Recognition): Detect and recognize printed or handwritten text from images.
  85. Skin Retouching: beautify the face image, remove skin blemishes, smooth skin, etc.
  86. Frequently Asked Questions (FAQ Question Answering): Answer users' questions based on frequently asked questions.
  87. Crowd Counting: Estimate the number of people based on crowd density in an image or video.
  88. Video Single Object Tracking: Tracks a single target object in a video sequence.
  89. Image ReID (Image ReID - Person): Re-identify the identity based on the appearance characteristics of the person in the image.
  90. Text-Driven Segmentation: Segment objects in images or videos based on a given textual description.
  91. Movie Scene Segmentation: Divide a movie or video into different scenes, each scene representing an independent plot or event.
  92. Shop Segmentation: Segment objects or areas in the store from images or videos for product display, intelligent monitoring and other applications.
  93. Image Inpainting: According to the existing image content, fill in the missing or damaged parts and restore the integrity of the original image.
  94. Image Paint-By-Example: Based on a given example image, modify other images to have a similar painting style or effect.
  95. Controllable Image Generation: By controlling input parameters or vectors, images with specific attributes, styles, or characteristics are generated.
  96. Video Inpainting: According to the existing video content, fill in the missing or damaged frames or areas, and restore the integrity of the original video.
  97. Video Human Matting: Separate the characters in the video from the background for subsequent editing or special effects processing.
  98. Human Reconstruction: Based on a given image, video or sensor data, reconstruct the 3D model or pose information of the human body.
  99. Video Frame Interpolation (Video Frame Interpolation): Generate frames between given two video frames to increase the frame rate or smooth transition of the video.
  100. Video Deinterlace (Video Deinterlace): Convert interlaced video to progressive scanning to improve the quality and fluency of video playback.
  101. Human Wholebody Keypoint Detection: Detect and locate key points of the human body in images or videos, such as head, hands, feet, etc.
  102. Hand Static: Recognize static gestures in images or videos by analyzing information such as palm shape and finger posture.
  103. Face-Human-Hand Detection: Detect and locate human face, human and hand regions in images or videos.
  104. Face Emotion Analysis (Face Emotion): By analyzing facial expressions, judge the emotional state expressed by the face in the image or video.
  105. Product Segmentation: Segment commodities or products in images or videos from the background for applications such as product recognition and advertisement recommendation.
  106. Referring Video Object Segmentation: Segment objects in images or videos based on a given reference image or video.
  107. Video Summarization (Video Summarization): According to the content and characteristics of the video, a summary or overview of the video is generated to provide convenience for video browsing and retrieval.
  108. Image Sky Change: Replace the sky part in the image with different sky backgrounds to change the atmosphere and environment of the image.
  109. Translation Evaluation: Based on a given translation result, evaluate its quality, accuracy, and consistency with the original text.
  110. Video Object Segmentation: Segment objects in the video from the background for subsequent editing or special effects processing.
  111. Video Multi-Object Tracking (Video Multi-Object Tracking): Simultaneously track multiple moving targets in the video, locate and track the position of the target in real time.
  112. Multi-View Depth Estimation: Estimate the three-dimensional depth information of objects in the scene through multiple views or images.
  113. Few-Shot Detection: In the case of only a small number of labeled samples, the target detection task is performed to improve the generalization ability of the model.
  114. Body Reshaping: According to the human body area in the image or video, adjust the shape, posture or proportion of the human body to change the appearance of the human body.
  115. Face Fusion: Fusing one person's facial features or expressions onto another person's avatar to generate a composite image with characteristics of both.
  116. Image Matching: In an image library or database, find the most similar or matching image to a given image.
  117. Image Quality Assessment - Subjective Scoring (Image Quality Assessment - MOS): Through the method of subjective scoring, the quality of the image is evaluated, which reflects the perception of the image by the human eye.
  118. Image Quality Assessment - Degradation (Image Quality Assessment - Degradation): Through the method of objective measurement, evaluate the quality of the image under different transformation or compression conditions.
  119. Vision Efficient Tuning: Quickly tune and optimize vision models and algorithms through automated methods to improve computing efficiency and accuracy.
  120. 3D object detection (Object Detection 3D): In 3D space, detect and locate the position, size and posture of the target object.
  121. Bad Image Detection: Identify and detect bad or low-quality images such as noise, blur, and distortion in the image.
  122. Nerf Reconstruction Accuracy Evaluation (NeRF Reconstruction Accuracy): Evaluates the accuracy and quality of the Neural Radiation Field (NeRF) model in building 3D scene reconstructions.
  123. Siamese UIE: Siamese networks are used for UIE tasks, the related problems of input user interface element recognition or generation.
  124. Mathematical formula recognition (LatexOCR): latex recognition of mathematical formulas in pictures.

Guess you like

Origin blog.csdn.net/weixin_42878111/article/details/132262605