The new work of Chen Danqi's team: A single card A100 can train 30 billion parameter models! - Code World

The new work of Chen Danqi's team: A single card A100 can train 30 billion parameter models!

Enterprise 2023-06-11 20:14:35 views: null

NoSuchKey

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/131118363

The new work of Chen Danqi's team: A single card A100 can train 30 billion parameter models!

Chen Danqi's team proposed MeZO, a low-memory and efficient zero-order optimizer, and a single-card A100 can train 30 billion parameter models

Fudan Qiu Xipeng's new work: single-machine fine-tuning of a large model with 65 billion parameters, industry insiders: it is of great significance to the popularization of large models...

Tsinghua Tang Jie's new work WebGLM, 10 billion parameters can be connected to the Internet

65 billion parameters, 8 GPUs can fine-tune: Qiu Xipeng's team has lowered the threshold of large models

65 billion parameters, 8 GPUs can fine-tune all parameters: Qiu Xipeng's team has lowered the threshold of large models

BDA: single parameter models

IPMI 2023 Hong Kong University of Science and Technology Chen Hao's team's new work | CTO: Rethinking the role of boundary detection in medical image segmentation

A large model with 7 billion parameters running on the iPhone, the latest achievement from Chen Tianqi's team

I installed a large model with 7 billion parameters on the iPhone, the latest achievement from Chen Tianqi's team

A100 is no longer available, how to train a large model with only a small graphics card

You can train your own Llama-2 model with only 1 A100

Burst! Microsoft's new work LongNet: Extend Transformer to 1 billion Tokens

Chen Tianqi: My iPhone can run large models!

Revealing how NVIDIA A100, A800, H100, and H800 GPUs can achieve 100-fold training acceleration for high-performance large models

Chen Danqi ACL'23 Tutorial - Study Notes for Large Language Models Based on Retrieval

Google's blockbuster new work OmniMotion: track all models! Don't be afraid of blocking!

Meta's blockbuster new work CM3leon: another breakthrough in the performance of multi-modal models!

Parameter options for django's models

Microsoft's open source depth study optimized libraries DeepSpeed, trainable 100 billion parameter model

Tsinghua glm team's new work: Multimodal VisualGLM-6b

With 65 billion parameters, 8 GPUs can fine-tune the parameters of the large model. The latest paper of Qiu Xipeng's team is here!

ONE Tech launches the world's first edge AI development kit that can directly embed and train AI models in the MCU

It's so hot! The new work of nnUNet research team | MedNeXt: The King of the New Generation Segmentation Architecture, Refreshing Multiple List Records!

Microsoft Azure AI team's new work | Florence-2: Unlocking a new realm of vision, universal perception leads the future!

A card runs a large model, the performance reaches 80% of 4090, and the price is only half: Produced by Chen Tianqi TVM team

Excel is weak! This tool took 30 minutes to complete my day's work, and I can learn from zero!

New work by Jia Jiaya and Han Song’s team: Two lines of code double the context window of a large model | GitHub hot list

New work by Stanford Wu Jiajun’s team: Generate an infinite 3D world from a picture or text! Netizens said: Unbelievable. . . ...

ChatGPT made me a "Superman" - a phased summary report on how to improve the team's performance by 30% and improve quality by 100%

Recommended

Ranking

css + html achieve 3D photo wall

Python Concise Guide: Novice will learn object-oriented []

ES6 inheritance (review prototype chain inheritance)

"A long article teaches you how to use appium in all aspects"

The third individual work - prototyping

HTML entity characters

Django (three) RESTFul of Django

Analysis of U disk file system (take FAT32 as an example)

Commonly used image drawing online experimental level - Level 5: Pie chart drawing

java programming design ideas

Daily

More

2025-05-02(0)

2025-05-01(0)

2025-04-30(0)

2025-04-29(0)

2025-04-28(0)

2025-04-27(0)

2025-04-26(0)

2025-04-25(0)

2025-04-24(0)

2025-04-23(0)