Implement code interpreter using chatglm

background

Before starting the article, you can first introduce what a code interpreter is. In practical terms, the so-called code interpreter allows the llm model to execute the code immediately and use the execution results as materials for the next round of model generation. There are two keywords here: "execute the code immediately" and "result as material". In fact, if llm does not have the ability to control the computer to obtain the execution result, and use the generated execution result as the material for the next round of control, it can only generate static code, then llm is still a static language generation model. But if llm can execute the generated code and get results, then llm is a controller and simulator. It can generate dynamic code connection according to needs, supplement the required materials, and make predictions and inferences about possible choices, making llm's ability directly Unlimited expansion; this is the value of code interpreter, which gives llm the ability to dynamically self-assemble, adjust possible combinations, and make precise and fuzzy simulation predictions for decision-making. In other words, llm has the ability to solve practical problems. Instead of just thinking metaphysically and giving some guiding theoretical copy, it can try it personally and completely connect the concrete implementation actions from the metaphysical to the metaphysical.

This is why the code interpreter only adds a function to make the generated code executable. Such a seemingly small change makes the big guys cheer for it. Of course, with the current ability of llm to generate code based on language and the code execution pass rate (whether the environment has packages or not), it is still far from a robust commercial system. Of course, the improvement of these capabilities requires comprehensive improvement of llm, and even the construction of a system to solve it; this must take time to settle and polish, and of course this is where the opportunity lies.

example:

1. Image generation, pass in the original image, instruct to generate the code for cutting out the portrait, the code interpreter processes the result, and then performs image2image generation on the image or fills in the background; the generated image is then used for line draft generation

2. Text generation. The text generated by llm has non-compliant multiple punctuation marks co-occurring. The instruction generates regular processing code, the code interpreter processes the results, and then the next step is text extraction or text rewriting.

3. Combine text and pictures, generate text and pictures and calculate the similarity. If it is not suitable, let llm control the picture to continue to generate until it meets the expectations. You can even make style judgments on the generated pictures before and after to ensure that the style of the before and after is consistent.

Technical points

To achieve the ability to execute the generated code immediately, llm needs to be able to interpret, compile, and convert the generated code into machine code for execution. Then for the compilation and execution of the code, each language actually has a compiler and executor. If we can send the code to the compiler and executor of the corresponding language, then the code can control the CPU calculation results. To achieve such a capability, there are at least 4 ways (using python language as an example):

1. Use python's exec method to set up a flask service for this python interpreter. The code generated by llm is passed to the server as a parameter, and the execution result is returned to the llm server.

2. Use the python interpreter method to implement it. Start a server to connect to llm to generate the code. After execution, the result is returned to the llm server.

3. Save the code generated by llm as a py file, and the llm server python os executes the code

4. Use ipython as the python code interpretation server, llm generates code and sends it to the execution result and returns it to the llm server

exec method

prog = '''# 导入需要的库
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 生成示例数据,假设是关于学生的成绩,年龄,性别等信息
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "score": [90, 80, 70, 60, 50],
    "age": [18, 19, 20, 21, 22],
    "gender": ["F", "M", "M", "M", "F"]
})

# 查看数据的基本信息,如行数,列数,数据类型,缺失值等
df.info()

# 查看数据的统计描述,如均值,标准差,最大值,最小值等
df.describe()

# 选择需要分析的列,假设是score, age, gender
cols = ["score", "age", "gender"]

# 绘制直方图,查看每列的分布情况
df[cols].hist(figsize=(10, 8))
plt.show()

# 绘制箱线图,查看每列的异常值情况
df[cols].boxplot(figsize=(10, 8))
plt.show()

# 绘制散点图矩阵,查看每两列之间的相关性
sns.pairplot(df[cols])
plt.show()'''
c = exec(prog)

python interpreter method

Just refer to the following article

https://mp.weixin.qq.com/s/_6E_yZ6g2X28tT2WAHeT7Q

python execution py file method

# 所以可以通过os.system来执行py代码
import os
os.system('python file_name.py')

ipython dispatch python compiler

from jupyter_client import KernelManager
import re 

class JupyterNotebook:
    def __init__(self):
        self.km = KernelManager()
        self.km.start_kernel()
        self.kc = self.km.client()

    def clean_output(self,outputs):
        outputs_only_str = list()
        for i in outputs:
            if type(i)==dict:
                if ('text/plain' in list(i.keys())):
                    outputs_only_str.append(i['text/plain'])
            elif type(i)==str:
                outputs_only_str.append(i)
            elif type(i) == list:
                error_msg = '\n'.join(i)
                error_msg = re.sub(r'\x1b\[.*?m', '', error_msg)
                outputs_only_str.append(error_msg)
        
        return '\n'.join(outputs_only_str).strip()

    def add_and_run(self, code_string):
        # Execute the code and get the execution count
        msg_id = self.kc.execute(code_string)
        
        # Wait for and return the outputs
        outputs = []
        error_flag = False
        while True:
            try:
                msg = self.kc.get_iopub_msg(timeout=10)
                
                msg_type = msg['header']['msg_type']
                content = msg['content']
                
                if msg_type == 'execute_result':
                    outputs.append(content['data'])
                elif msg_type == 'stream':
                    outputs.append(content['text'])
                elif msg_type == 'error':
                    error_flag = True
                    outputs.append(content['traceback'])

                # If the execution state of the kernel is idle, it means the cell finished executing
                if msg_type == 'status' and content['execution_state'] == 'idle':
                    break
            except:
                break
        
        #print(outputs)
        return self.clean_output(outputs), error_flag

You can also use jupyter_client as an executor and deploy flask as a service. For examples, please refer to: https://github.com/ricklamers/gpt-code-ui.git

LLM code generation technology points

class BaseCodeInterpreter:

    def __init__(self):
        
        self.dialog = [
            {"role": "system", "content": CODE_INTERPRETER_SYSTEM_PROMPT,},
            #{"role": "user", "content": "How can I use BeautifulSoup to scrape a website and extract all the URLs on a page?"},
            #{"role": "assistant", "content": "I think I need to use beatifulsoup to find current korean president,"}
        ]

        self.nb = JupyterNotebook()
    #把llm生成的code部分抽取出来
    @staticmethod
    def extract_code_blocks(text : str):
        pattern = r'```(?:python\n)?(.*?)```' # Match optional 'python\n' but don't capture it
        code_blocks = re.findall(pattern, text, re.DOTALL)
        return [block.strip() for block in code_blocks]

    @staticmethod
    def parse_last_answer(text: str) -> str:
        return text.split(E_INST)[-1]

    #把llm生成的抽取的code塞到jupyter解释器执行,得到结果返回给用户
    def execute_code_and_return_output(self, code_str: str) -> str:
        outputs, error_flag = self.nb.add_and_run(code_str)
        return outputs, error_flag

The llm model encapsulates the code generation capability, and the code interpreter capability is available. The following code model_path can be replaced with chatglm or codegeex2-6b.

class LlamaCodeInterpreter(BaseCodeInterpreter):

    def __init__(self, model_path: str, load_in_8bit : bool = False, load_in_4bit : bool = False):
        #self.model = LlamaForCausalLM.from_pretrained(model_path, device_map="auto", load_in_4bit = load_in_4bit,load_in_8bit=load_in_8bit, torch_dtype=torch.float16,use_safetensors=True)
        #self.tokenizer = LlamaTokenizer.from_pretrained(model_path)

        self.tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
        self.model = AutoModel.from_pretrained(model_path,trust_remote_code=True).cuda()
        '''
        # Add special token
        special_tokens_dict = dict()
        if self.tokenizer.pad_token is None:
            special_tokens_dict["pad_token"] = DEFAULT_PAD_TOKEN
        if self.tokenizer.eos_token is None:
            special_tokens_dict["eos_token"] = DEFAULT_EOS_TOKEN
        if self.tokenizer.bos_token is None:
            special_tokens_dict["bos_token"] = DEFAULT_BOS_TOKEN
        if self.tokenizer.unk_token is None:
            special_tokens_dict["unk_token"] = DEFAULT_UNK_TOKEN
        
        smart_tokenizer_and_embedding_resize(
            special_tokens_dict=special_tokens_dict,
            tokenizer=self.tokenizer,
            model=self.model,
        )
        '''

        self.dialog = [
            {"role": "system", "content": CODE_INTERPRETER_SYSTEM_PROMPT + "\nUse code to answer",},
            #{"role": "user", "content": "How can I use BeautifulSoup to scrape a website and extract all the URLs on a page?"},
            #{"role": "assistant", "content": "I think I need to use beatifulsoup to find current korean president,"}
        ]

        self.nb = JupyterNotebook()

    def dialog_to_prompt(self, dialog: List[Dialog], SYS_PROMPT: str = '') -> torch.Tensor:
    
        """
            code borrowed from : https://github.com/facebookresearch/llama/blob/main/llama/generation.py
        """
        if dialog[0]["role"] != "system":
            dialog = [
                {
                    "role": "system",
                    "content": SYS_PROMPT,
                }
            ] + dialog
        dialog = [
            {
                "role": dialog[1]["role"],
                "content": B_SYS + dialog[0]["content"] + E_SYS + dialog[1]["content"],
            }
        ] + dialog[2:]

        assert all([msg["role"] == "user" for msg in dialog[::2]]) and all(
            [msg["role"] == "assistant" for msg in dialog[1::2]]
        ), (
            "model only supports 'system', 'user' and 'assistant' roles, "
            "starting with 'system', then 'user' and alternating (u/a/u/a/u...)"
        )

        #print(dialog[::2], dialog[1::2],)

        dialog_tokens: List[int] = sum(
            [
                self.tokenizer.encode(
                    f"{B_INST} {(prompt['content']).strip()} {E_INST} {(answer['content']).strip()} ",
                )
                for prompt, answer in zip(
                    dialog[::2],
                    dialog[1::2],
                )
            ],
            [],
        )
        #assert (
        #    dialog[-1]["role"] == "user"
        #), f"Last message must be from user, got {dialog[-1]['role']}"
        dialog_tokens += self.tokenizer.encode(
            f"{B_INST} {(dialog[-1]['content']).strip()} {E_INST}",
        )

        return torch.tensor(dialog_tokens).unsqueeze(0)

    def hard_coded_eos_splitter(self):
        self.dialog[-1]['content'] = self.dialog[-1]['content'].split(DEFAULT_EOS_TOKEN)[0]

    def chat(self, user_message: str, VERBOSE :bool = False):
        self.dialog.append({"role": "user", "content": user_message})

        code_block_output = ""
        attempt = 0 
        img_data = None

        if VERBOSE:
            print('###User : ' + Fore.BLUE + Style.BRIGHT + user_message + Style.RESET_ALL)
            print('\n###Assistant : ')
        while True:
            if attempt > 3:
                break
            dialog_tokens = self.dialog_to_prompt(dialog=self.dialog)

            gen_tokens = self.model.generate(dialog_tokens.cuda(),
                                            max_new_tokens=4096,
                                            top_p=0.8,
                                            temperature=0.95,
                                            do_sample=True,
                                            use_cache=True)

            generated_text_all = self.tokenizer.batch_decode(gen_tokens)[0]
            generated_text = self.tokenizer.batch_decode(gen_tokens[:, dialog_tokens.shape[1]:])[0]

            last_answer = self.parse_last_answer(generated_text_all)
            
            generated_code_blocks = self.extract_code_blocks(generated_text)

            if len(generated_code_blocks) > 0:
                # Find the position of the first code block in the last answer
                first_code_block_pos = generated_text.find(generated_code_blocks[0]) if generated_code_blocks else -1
                text_before_first_code_block = generated_text if first_code_block_pos == -1 else generated_text[:first_code_block_pos]
                if VERBOSE:
                    print(Fore.GREEN + text_before_first_code_block + Style.RESET_ALL)
                if VERBOSE:
                    print(Fore.YELLOW + generated_code_blocks[0]+ '\n```\n' + Style.RESET_ALL)
                code_block_output, error_flag = self.execute_code_and_return_output(generated_code_blocks[0])

                code_block_output = f'{code_block_output}'

                if code_block_output is not None:
                    code_block_output = code_block_output.strip()

                code_block_output_str = f'\n```RESULTS\n{code_block_output}\n```\n'
                if VERBOSE:
                    print(Fore.LIGHTBLACK_EX + code_block_output_str + Style.RESET_ALL)
                    #markdown = Markdown(code_block_output_str)print(markdown)

                gen_final = f'{text_before_first_code_block}{generated_code_blocks[0]}\n```{code_block_output_str}'

                if self.dialog[-1]['role'] == 'user':
                    self.dialog.append({"role": "assistant", "content": gen_final})
                elif self.dialog[-1]['role'] == 'assistant':
                    self.dialog[-1]['content'] += gen_final
            else:
                if self.dialog[-1]['role'] == 'user':
                    self.dialog.append({"role": "assistant", "content": generated_text})
                else:
                    self.dialog[-1]['content'] += generated_text
                # no code found break
                if VERBOSE:
                    print(Fore.GREEN + generated_text + Style.RESET_ALL)
                break

            # early stop 
            if DEFAULT_EOS_TOKEN in self.dialog[-1]['content']:
                self.hard_coded_eos_splitter()
                if img_data is not None:
                    return f'{self.dialog[-1]}\n![plot](data:image/png;base64,{img_data})'
                return self.dialog[-1]
            
            self.hard_coded_eos_splitter()

            attempt += 1
            #print(f"====Attempt[{attempt}]====\n{self.dialog[-1]['content']}")

        #print(self.dialog)
        if img_data is not None:
            return f'{self.dialog[-1]}\n![plot](data:image/png;base64,{img_data})'
        return self.dialog[-1]

The execution results are as follows:

The blue part is the user input question, the yellow part is the code generated by llm, and the gray part is the result of the Python interpreter executing the code.

summary

1. The article introduces the value and meaningful direction of code interpreter from the perspective of technology trends.

2. Introduced the core issue of code interpreter implementation, which is how to adjust the code generated by llm to be executed by the compiler.

3. Taking the python language as an example, 4 feasible code-to-code execution methods are listed, and the specific implementation code is given.

4. Introduced how to encapsulate llm and the code interpreter into code to implement the execution results of the interpreter, and gave the code

5. An integrated project implementation is given, and a simple bubble sort implementation example is given.

Guess you like

Origin blog.csdn.net/liangwqi/article/details/132074482