Summary of papers on few-shot / one shot / zero shot object counting

2021

Learning To Count Everything

code: https://paperswithcode.com/paper/learning-to-count-everything
Abstract : Existing visual counting work mainly focuses on a specific category, such as people, animals, and cells. In this paper, we are interested in counting everything, i.e. counting objects from any category, given only a few annotated instances from that category. To this end, we formulate counting as a few-shot regression task. To address this task, we propose a novel method that uses a query image with a few example objects in the query image and predicts a density map of the presence of all objects of interest in the query image. We also propose a novel adaptation strategy to adapt our network to any new visual category at test time, using only a few exemplar objects from the new category. We also introduce a dataset of 147 object categories with over 6000 images suitable for the few-shot counting task. These images are annotated with two types, points and bounding boxes, which can be used to develop few-shot counting models. Experiments on this dataset show that our method outperforms several state-of-the-art object detectors and few-shot counting methods.
insert image description here

OBJECT COUNTING: YOU ONLY NEED TO LOOK AT ONE

Abstract: This paper aims to address a challenging task of popular object counting. Given an image containing objects of a novel, previously unseen class, the goal of this task is to compute all instances in the desired class using only one supported bounding box example. To this end, we propose a counting model by which you only need to look at one instance (LaoNet). First, a feature correlation module combines self-attention and correlation-attention modules to learn internal and inter-relationships. It makes the network robust to inconsistencies in rotation and size between different instances. Second, a scale aggregation mechanism is designed to help extract features with different scale information. Compared with existing few-shot counting methods, LaoNet achieves state-of-the-art results while learning to converge faster.
insert image description here

2022

Learning to Count Anything: Reference-less Class-agnostic Counting with Weak Supervision

code: https://paperswithcode.com/paper/learning-to-count-anything-reference-less

Abstract: Current class-agnostic counting methods can generalize to unseen classes, but often require reference images to define the types of objects to count, as well as instance annotations during training. Reference-free class-agnostic counting is an emerging field that considers counting to be, at its core, a repetition recognition task. These methods are useful for computing an ever-changing combination of collections. We show that a general feature space with global context can enumerate instances in images without enumerating the presence of object types. Specifically, we demonstrate that regression from visual transformer features without point-level supervision or reference images outperforms other no-reference methods and is competitive with methods using reference images. We demonstrate this on the current standard few-shot counting dataset FSC-147. We also present an improved dataset, FSC-133, that removes errors, ambiguities, and duplicate images from FSC-147, and demonstrate similar performance on it. To the best of our knowledge, ours is the first weakly supervised class-agnostic counting method.
insert image description here

Scale-Prior Deformable Convolution for Exemplar-Guided Class-Agnostic Counting

Abstract: Class-agnostic counting has recently emerged as a more practical counting task, which aims to predict the number and distribution of any exemplar object, rather than counting specific classes such as pedestrians or cars. However, recent approaches design similarity matching rules between samples and query images while ignoring the robustness of extracted features. To address this issue, we propose a scale-prior deformable convolution by integrating the information of the samples, e.g., the scale, into the backbone of the counting network. The results show that the proposed counting network can extract semantic features of objects similar to a given sample and effectively filter irrelevant backgrounds. Furthermore, we find that traditional l2 and generalized losses are not suitable for class-agnostic counts due to the differences in object scales in different samples. Here, we propose a scale-sensitive generalized loss to address this problem. It adjusts the cost function formulation for a given paradigm, making the difference between predictions and ground truth more prominent. Extensive experiments show that our model achieves significant improvements and achieves state-of-the-art performance on a public class-independent counting benchmark.
insert image description here
insert image description here

CounTR: Transformer-based Generalised Visual Counting

Abstract: In this paper, we consider the generalized visual object counting problem with the aim of developing a computational model to count objects of arbitrary semantic categories, using an arbitrary number of "exemplars", i.e. zero-shot or low-shot counts. To this end, we make the following four contributions: (1) We introduce a new transformer-based architecture for generalizing visual object counting, called the counting transformer (CounTR), which explicitly captures similarity between or given “samples”; (2) adopt a two-stage training mechanism, first pre-training the model with self-supervised learning, followed by supervised fine-tuning; (3) we propose a simple, scalable pipeline for synthesizing training images with a large number of instances or from different semantic categories, explicitly forcing the model to use a given "paradigm"; (4) We conduct a thorough ablation study on large-scale counting benchmarks such as FSC-147, And demonstrate state-of-the-art performance on zero-shot and few-shot settings.
insert image description here

Few-shot Object Counting with Similarity-Aware Feature Enhancement

code: https://github.com/zhiyuanyou/SAFECount

Abstract: This work studies the problem of few-shot object counting, which counts the number of exemplar objects (i.e., described by one or a few supporting images) occurring in a query image. The main challenge is that target objects can be densely packed in the query image, making it difficult to identify every object. To address this obstacle, we propose a new learning module, including a similarity comparison module and a feature enhancement module. Specifically, given a support image and a query image, we first obtain a score map by comparing their projected features at each spatial location. Score maps over all supporting images are collected together and normalized across exemplar and spatial dimensions, yielding a robust similarity map. Then, we utilize the developed point-wise similarity as a weighting coefficient to augment query features with support features. This design encourages the model to examine the query image by focusing more on regions similar to the support image, resulting in sharper boundaries between different objects. Extensive experiments on various benchmarks and training settings show that we outperform the state-of-the-art methods by a sufficiently large margin. For example, on a recent large-scale FSC-147 dataset, we outperform state-of-the-art methods by improving the mean absolute error from 22.08 to 14.32 (35% ↑).
insert image description here
insert image description here

2023

CAN SAM COUNT ANYTHING? AN EMPIRICAL STUDY ON SAM COUNTING

code: https://github.com/Vision-Intelligence-and-Robots-Group/count-anything

Abstract: Meta AI recently released the "Segmentation Anything Model" (SAM), which has gained attention due to its impressive performance on class-agnostic segmentation. In this study, we explore the use of SAMs for the challenging few-shot object counting task of counting objects of an unseen category by providing several bounding boxes. We compare the performance of SAM with other few-shot counting methods and find that without further fine-tuning it is currently unsatisfactory, especially for small and crowded objects.
insert image description here
insert image description here

Zero-Shot Object Counting

code: https://github.com/cvlab-stonybrook/zero-shot-counting

Abstract: The purpose of class-agnostic object counting is to count object instances of arbitrary classes at test time. Current approaches to this challenging problem require human-annotated examples as input, which are often not available for new classes, especially for autonomous systems. Therefore, we propose Zero Shot Object Count (ZSC), a new setting where only class names are available during testing. Such a counting system does not require human annotators in the loop and can operate automatically. Starting from a class name, we propose a method that accurately identifies optimal patches, which can then be used as counting samples. Specifically, we first construct a class prototype to select patches that are likely to contain objects of interest, i.e., patches related to the class. Furthermore, we introduce a model that quantitatively measures the fitness of arbitrary patches as counting paradigms. By applying this model to all candidate patches, we can select the most suitable patch as an example for counting. Experimental results on the recent class-agnostic counting dataset FSC-147 validate the effectiveness of our approach.
insert image description here

CounTR: Transformer-based Generalised Visual Counting

code: https://paperswithcode.com/paper/countr-transformer-based-generalised-visual

Abstract : In this paper, we consider the generalized visual object counting problem with the aim of developing a computational model to count objects of arbitrary semantic categories, using an arbitrary number of "exemplars", i.e. zero-shot or low-shot counts. To this end, we make the following four contributions: (1) We introduce a new Transformer-based architecture for generalizing visual object counting, called Counting Transformer (CounTR), which explicitly captures image patches or a given "sample"; (2) adopt a two-stage training mechanism, first pre-training the model with self-supervised learning, followed by supervised fine-tuning; (3) we propose a simple, scalable pipeline for synthesizing training images with a large number of instances or from different semantic categories, explicitly forcing the model to use a given "paradigm"; (4) We conduct a thorough ablation study on large-scale counting benchmarks such as FSC-147 , and demonstrate state-of-the-art performance on the zero-shot and few-shot settings.
insert image description here

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

code: https://paperswithcode.com/paper/clip-count-towards-text-guided-zero-shot

Abstract : Recent advances in visual language models have shown remarkable zero-shot text-to-image matching capabilities, transferable to downstream tasks such as object detection and segmentation. However, tuning these models for object counting, which includes estimating the number of objects in an image, remains a formidable challenge. In this study, we explore class-agnostic object counting for transferring visual-language models for the first time. Specifically, we propose CLIP-Count, a novel pipeline that estimates density maps of open-vocabulary objects in a zero-shot text-guided manner without any fine-tuning for specific object classes. To align text embeddings with dense image features, we introduce a block-text contrastive loss that guides the model to learn informative block-level image representations for dense prediction. Furthermore, we design a hierarchical patch-text interaction module to propagate semantic information over different resolution image features. Benefiting from fully exploiting the rich image-text alignment knowledge of pre-trained visual-language models, our method efficiently generates high-quality density maps for objects of interest. Extensive experiments on the FSC-147, CARPK, and Shanghai Science and Technology crowd counting datasets demonstrate that our proposed method achieves the accuracy and generality of state-of-the-art zero-shot target counting.

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/weixin_42990464/article/details/131210448