MosaicML launched a 30 billion parameter model with a training cost of 700,000

AI start-up company MosaicML recently released its language model MPT-30B. From the perspective of parameters alone, this model has 30 billion parameters, which is not outstanding in the field of models with hundreds of billions of parameters. But the training cost of this new model is only a fraction of other models, which is expected to expand the application of the model in a wider range of fields.

Naveen Rao, CEO and co-founder of MosaicML, said MPT-30B cost $700,000 to train, far less than the tens of millions needed to train GPT-3. In addition, the quality of the MPT-30B model exceeds that of the original GPT-3 released by OpenAI in 2020. Due to the lower cost and smaller size of the MPT-30B, it can also be trained more quickly and deployed on local hardware devices.

MosaicML uses Alibi and FlashAttention technology to optimize the model, which can achieve longer text length and high utilization of GPU computing. MosaicML is also one of the few labs able to use Nvidia H100 GPUs, which has increased throughput per GPU by more than 2.4x compared to before, resulting in faster completion times.

30 billion parameters This is a number often seen in the field of large models. Why is 30 billion parameters so special? Frankle, chief scientist of MosaicML, explained that the first 30 billion parameters can ensure that it can easily run on local hardware, while maintaining the same quality as GPT-3 or slightly better than it.

Second, any model that exceeds the 30 billion parameter limit requires breaking down the model into multiple parallel segments, often also requiring a more expensive multi-GPU setup.

In addition to making AI technology more accessible, MosaicML also focuses on improving data quality to improve model performance. They are currently developing tools to help users layer in domain-specific data during pre-training. This ensures a diverse and high-quality data mix. Scaling the model to 30 billion parameters is just the first step for MosaicML, and then they will launch larger, higher-quality models on the premise of reducing costs.

Developers can download and use the open source MPT-30B basic model from Hugging Face , and developers can also fine-tune the model on their own hardware with their own data.

Guess you like

Origin www.oschina.net/news/246496/mosaicml-mpt-30b
Recommended