Segment and Track Anything is an open source project focused on automatic and interactive methods for segmenting and tracking anything in video. The main algorithms used include SAM (Segment Anything Models), DeAOT (Decoupled Features Associating Objects with Transformers) for automatic/interactive keyframe segmentation and (NeurIPS2022) for efficient multi-object tracking and propagation. The SAM-Track pipeline enables SAM to dynamically and automatically detect and segment new objects, while DeAOT is responsible for tracking all identified objects.
Source code location: https://github.com/zx-yang/Segment-and-Track-Anything
1. Download the code
pip install git+https://github.com/z-x-yang/Segment-and-Track-Anything.git
2. Installation
Requirements: python>=3.9, pytorch>=1.10, torchvision>=0.11
1. Installation dependencies
bash script/install.sh
2. Download the model
Create a new ckpt folder and put the model file into it
https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
https://drive.google.com/file/d/1QoChMkTVxdYZ_eBlZhK2acq9KMQZccPJ/view
Download default weights
bash script/download_ckpt.sh
3. Run Demo
Put the video to be processed into the ./assets folder
Run demo.ipynb with jupyter notebool
to generate the file is also in the ./assets folder
The parameters of SAM-Track, DeAOT and SAM can be manually modified in model_args.py
or web open
python app.py
Use a browser to open the given network link
Click input-video to upload video
Adjust SAM-Track parameters
Click Seg and Track to get the result
parameter:
- aot_model : Used to select the DeAOT/AOT version for tracking and propagation. sam_gap : Used to control how often SAM
adds newly appearing objects at the specified frame interval. Increase to reduce the frequency of discovering new objects, but significantly increase the speed of inference. - points_per_side : Used to control the number of points per side used to generate the mask by sampling the grid on the image.
Increasing the size can enhance the ability to detect small objects, but larger objects may be segmented into finer granularity. - max_obj_num : Used to limit the maximum number of objects that SAM-Track can detect and track. The higher the number of objects, the higher the memory utilization, and about 16GB
of memory can handle up to 255 objects.