The world's first fully open source command-following large model; the most comprehensive inventory from T5 to GPT-4

1. Dolly 2.0: The world's first fully open-source instruction-following LLM

Two weeks ago, Databricks released Dolly, a ChatGPT-like large language model (LLM), which costs less than $30 to train. Today, they released Dolly 2.0, the industry's first open source instruction-following LLM, fine-tuned on a high-quality human-generated instruction dataset (15,000 prompt/response pairs). Dolly 2.0 is based on the EleutherAI pythia model series and is a language model with 12B parameters.

They are fully open-sourcing Dolly 2.0, including training code, datasets, and model weights, which are all commercially available. This means any organization can create, own and customize a powerful LLM without paying for API access or sharing data with third parties.

Links:
1. https://huggingface.co/databricks;
2. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

2. A new overview of large-scale language models: the most comprehensive inventory from T5 to GPT-4

Considering the rapid technological progress of LLMs, more than two dozen researchers from Renmin University of China reviewed the latest progress of LLMs through background knowledge, key findings, and mainstream technologies, especially focusing on the pre-training, adaptive tuning, and use of and competency assessment. In addition, they also summarized and developed the available resources of LLMs, discussed the future development direction and so on. This overview is an extremely useful learning resource for researchers and engineers in the field.

Link:

https://mp.weixin.qq.com/s/7HRr55Md2Wl6EHQMGioumw

3. Founder of OpenAI: The research origin and construction method of GPT-4

The achievements of the GPT model are enviable, but this is based on OpenAI's years of technical exploration and firm belief. As the main "behind-the-scenes driving force" who has been deeply involved in the production process of the GPT model from 0 to 1, and promoted the GPT research and engineering implementation, Brockman has a deep understanding of this, "It is not trying to get rich quickly, but has been slowly accumulating Value, only then has the huge return brought by the exponential growth.”

Link:

https://mp.weixin.qq.com/s/hO1ZdqgOjpA328luobQ9eg

4. John Schulman, author of ChatGPT: Our secret weapon for success

The new dialogue data is important, but it makes it easier for ChatGPT to infer the user's intentions. The root cause of the qualitative change is the "Reinforced Learning with Human Feedback (RLHF)" technology that has been used in InstructGPT. OpenAI co-founder and research scientist John Schulman believes that RLHF is the secret sauce of ChatGPT. In this article, we can see the evolution of ChatGPT technology and details that have not been described in the paper, as well as the next research direction of the OpenAI team.

Link:

https://mp.weixin.qq.com/s/sDeBYMvAwbJr5_tj7Q20-w

5. The technology behind BLOOM, an open-source large model with hundreds of billions of parameters

In recent years, it has become the norm for language models to get bigger and bigger. People usually criticize that the information of these large models themselves is not disclosed for research, but little attention is paid to the knowledge behind the large model training technology. This article aims to take the 176 billion parameter language model BLOOM as an example to clarify the software and hardware engineering and technical points behind training such models, so as to promote discussions on large model training techniques.

Link:

https://zhuanlan.zhihu.com/p/615839149

6. Top 10 Common Mistakes and Solutions for Distributed Training

In the era of large language models (LLMs), distributed training is imperative because data and model weights rarely fit on the same card at the same time. However, distributed training in ML is complex and error-prone, with many hidden pitfalls that can cause huge problems during model training. This article will describe ten of the most common mistakes in distributed model training and suggest solutions for each.

Link:
https://neptune.ai/blog/distributed-training-errors

6.5. AutoGPT is so popular that it can complete tasks autonomously without human intervention

Recently, there seems to be a new trend in the AI world: autonomous artificial intelligence. This is not groundless, and a recent study called AutoGPT has begun to enter the public eye. Andrej Karpathy, Tesla's former AI director and just returned to OpenAI, also promoted it and praised it on Twitter: "AutoGPT is the next frontier of prompt engineering."

Link:

https://mp.weixin.qq.com/s/bV1tPc7hNn2z06YOpzyanw

7. Understanding Large Language Models (Introductory Reading List)

Because Transformer has had such an impact on everyone's research work, the authors have compiled a reading list for machine learning researchers and practitioners to get started with LLM.

Link:
https://sebastianraschka.com/blog/2023/llm-reading-list.html?

8. Summary of large models (over 1 billion parameters)

Large models (Large Language Models, LLMs) are one of the most important directions in current AI and NLP research and industry. This article will summarize the current mainstream large-scale models. A model with a parameter size above 1B is considered a large model.

Link:
https://zhuanlan.zhihu.com/p/611403556

9. ML system introductory data collation (tvm&mlir&llvm)

For developers who want to get started with mlsys or want to learn a certain compiler in depth, I hope this information can be a good starting point.

Link:
https://zhuanlan.zhihu.com/p/618229430

10. Talk about some understanding of OpenAI Triton

Triton should be regarded as the implementation based on the MLIR compilation technology path that the author has seen. The performance and functions can meet some actual needs and have been tested in production. It is also the first open source work that solves the development needs of computing-intensive operators on mainstream AI accelerators.

Link:
https://zhuanlan.zhihu.com/p/613244988

11. mperf: A powerful tool for operator performance tuning on mobile/embedded platforms

On the mobile/embedded platform, in order to maximize the hardware computing power, the pursuit of the ultimate performance of the operator becomes inevitable. Unlike the desktop/server platform, the mobile/embedded platform has a lot of tools to choose from in terms of operator performance tuning. few. mperf is a micro-architecture-level operator performance tuning toolbox, mainly for CPU/GPU cores of mobile/embedded platforms, with the goal of providing a series of basic tools for "building a closer closed-loop operator tuning feedback loop".

Link:
https://zhuanlan.zhihu.com/p/610346564

12. Getting Started with Small Python Compiler Projects

It is suitable for students who are interested in compilation optimization, high-performance computing, and GPU programming. Students with zero foundation are fine, but they need to be familiar with Python programming.

The code of the compiler and test part is completely written in Python, and the operator part uses the rawKernel function of cupy to compile the cuda code into a Python function. At present, the code part of the first module has been completed, which is divided into 5 days. All the codes in each day add up to no more than 100 lines, which is easy to understand.

Link:
https://zhuanlan.zhihu.com/p/603352525

13. CUDA programming: common techniques/methods

Whether you are learning CUDA or optimizing operators, mastering some CUDA programming skills can improve your work efficiency and even find a better solution. This article mainly introduces some commonly used techniques/methods, accompanied by practical code, hoping to be helpful to readers.

Link:
https://zhuanlan.zhihu.com/p/584501634

14. NCCL source code analysis ①: initialization and generation of ncclUniqueId

NCCL is NVIDIA's open source GPU communication library, which supports collective communication and point-to-point communication.

Link:

https://mp.weixin.qq.com/s/_SOmkGoo9DblXb8ddyEeaQ

15. Adapting to PyTorch FX, OneFlow makes quantization-aware training easier

OneFlow followed by adding fx for OneFlow, that is, One-fx. After installing One-fx, users can directly call oneflow.fx, or use it directly through import onefx as fx.

Link:

https://mp.weixin.qq.com/s/O8yGUuTL-o_gHQV4xez_nQ

16. One-YOLOv5 v1.2.0 release: support classification, detection, instance segmentation

The new version synchronizes the upstream branch v7.0 of Ultralytics YOLOv5, and supports classification, target detection, and instance segmentation tasks at the same time; supports flask_rest_api; supports the use of wandb for experiment tracking and visualization; oneflow_hub_support_pilimage; reduces h2d and once for the compute_loss part of each batch cpu slice_update operation; optimize the bbox_iou function and the sliding average part of the model, greatly improving the training performance;

Compatible with FlowFlops, the FLOPs of the model can be displayed during training

Link:

https://mp.weixin.qq.com/s/bkEkInaF7Ht7KsdXUFkw-Q

everyone else is watching

Try OneFlow: github.com/Oneflow-Inc/oneflow/

The world's first fully open source command-following large model; the most comprehensive inventory from T5 to GPT-4

Guess you like