一、问题描述
如题报错:“topk_cpu” not implemented for ‘Half’
是在使用transformers
库时本地导入某个模型,完整报错如下:
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/guomiansheng/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 1028, in chat
outputs = self.generate(**inputs, **gen_kwargs)
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/utils.py", line 2538, in sample
next_token_scores = logits_warper(input_ids, next_token_scores)
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/logits_process.py", line 92, in __call__
scores = processor(input_ids, scores)
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/logits_process.py", line 302, in __call__
indices_to_remove = scores < torch.topk(scores, top_k)[0][..., -1, None]
RuntimeError: "topk_cpu" not implemented for 'Half'
二、解决方法
如果模型权重做了半精度(fp16
),如半精度的chatglm2-6b模型需要13GB内存,如果是16GB显存的macbook pro运行时显存不足时会很卡。刚才的问题是torch.topk
不支持半精度fp16计算,可以使用float()
转换为fp32
后再将to("mps")
。当然除非迫不得已还是使用cuda更好啦。
Reference
[1] Bug: RuntimeError: “topk_cpu” not implemented for ‘Half’