[Paper notes] chatgpt series 2.6 DeepSpeed-chat dataset

1. FT dataset & Reward model dataset

Dataset of Deepspeed-chat source code:

  1. Dahoas/rm-static: This is a static environment dataset for reinforcement learning, which contains the trajectory of a robot in a fixed environment. This dataset is intended for evaluating the performance of reinforcement learning algorithms in static environments.

  2. Dahoas/full-hh-rlhf: This is a dataset for deep reinforcement learning that contains the trajectory of a robot in a dynamic environment. This dataset is intended for evaluating the performance of deep reinforcement learning algorithms in dynamic environments.

  3. Multi-turn dialogue dataset  Dahoas/synthetic-instruct-gptj-pairwise: This is a dataset for natural language processing that contains dialogue between two people. This dataset is intended for evaluating the performance of natural language processing models on dialogue generation tasks.     

  4. yitingxie/rlhf-reward-datasets: This is a dataset for reinforcement learning, which contains the movement trajectories and reward signals of multiple robots in different environments. This dataset is intended for evaluating the performance of reinforcement learning algorithms in multi-agent systems.

 Second, replace with a custom data set

wikitext2 ptb c4

Guess you like

Origin blog.csdn.net/Trance95/article/details/132043708