OpenVINO™ to run the GPT-2 model

The most popular topic in the field of artificial intelligence recently is chatGPT and the newly released GPT-4 model. The powerful capabilities of these two generative AI models in the fields of question answering, search, and text generation often make every user who has used them dumbfounded and amazed. Speaking of the above two GPT models, I believe everyone has heard that their "superpowers" come from their own super-large model size. Every time AI reasoning is run, huge computing power is required to support it. Obviously, in Running such a large model on a local device is unlikely.

However, did you know that the GPT-2 model, which belongs to the GPT model family with them, is optimized and accelerated by the OpenVINO™ open source tool suite, and it is less than a thousand dollars on a computer equipped with an 11th generation Intel processor N5105 2.0-2.9GHz (Jasper Lake). The AI ​​development board can run the reasoning of the text generation task.

AI X development board

The AIxBoard (AixBoard) development board is designed to support entry-level edge AI applications and devices. It can meet the application scenarios of artificial intelligence learning, development, and training.

The development board is a Raspberry Pi-like x86 host, which can support Linux Ubuntu and the full version of Windows operating system. Onboard an Intel 4-core processor with a maximum operating frequency of 2.9 GHz, built-in graphics (iGPU), onboard 64GB eMMC storage and LPDDR4x 2933MHz (4GB/6GB/8GB), built-in Bluetooth and Wi-Fi modules , support USB 3.0, HDMI video output, 3.5mm audio interface, 1000Mbps Ethernet port. It can be regarded as a mini computer, and it can integrate an Arduino Leonardo single-chip microcomputer, and can expand various sensor modules.

In addition, its interface is compatible with the Jetson Nano carrier board, and the GPIO is compatible with the Raspberry Pi, which can maximize the reuse of ecological resources such as Raspberry Pi and Jetson Nano, whether it is camera object recognition, 3D printing, or CNC real-time interpolation control. Can run stably. It can be used as an edge computing engine for artificial intelligence product verification and development; it can also be used as a domain control core for robot product development.

The x86 architecture supports a complete Windows system, and you can directly obtain the most powerful software support such as Visual Studio, OpenVINO, OpenCV without special optimization, the most mature development ecology, and millions of open source projects to provide more creativity for you. More help. Whether you are a DIY fanatic, an interaction designer or a robot expert, the AI ​​X development board is your ideal partner!

Next, let us go through the following step-by-step tutorial to see how such reasoning works on the edge AI development board.

how to install

First, the OpenVINO Notebooks operating environment needs to be installed on the AI ​​development board. Since the AI ​​development board comes with a Windows operating system, you can refer to the following specific steps to install the OpenVINO Notebooks operating environment on Windows.

https://github.com/openvinotoolkit/openvino_notebooks/wiki/Windows

Of course, the AI ​​development board is also very convenient to support flashing the system into a Linux system. If you are a Linux user, you can click this link: https://github.com/openvinotoolkit/openvino_notebooks/wiki/Ubuntu

In general, all the installation steps can be completed through the following three installation steps, and then all the notebooks code examples can be loaded.

  • Install Python 3.10.x (or Python3.7, 3.8 or 3.9 version), and create a virtual environment

python3 -m venv openvino_env
openvino_env\Scripts\activate
  • Implement Git cloning of the directory (if you don't have Git installed, go here to install it first)

git clone --depth=1 https://github.com/openvinotoolkit/openvino_notebooks.git
cd openvino_notebooks
  • Install all libraries and dependencies

pip install -r requirements.txt
  • Run Jupyter Notebook

jupyter lab notebooks

Using OpenVINO to optimize and deploy GPT2 on the AI ​​X development board

Next, let's take a look at the main steps of running GPT2 on the AI ​​development board for text generation.

Note: All the codes in the following steps come from the 223-gpt2-text-prediction notebook code sample in the OpenVINO Notebooks open source repository. You can click the link below to directly access the source code. https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/223-gpt2-text-prediction/223-gpt2-text-prediction.ipynb

The running process of the entire code example is shown in the figure below:

  • Preprocessing first defines the word segmentation.  The input of this text generation task is a sequence of natural language, on this basis, the GPT-2 model will generate a long text of relevant content. Since natural language processing models typically take a list of tokens as standard input, a token is usually an integer value that maps a word to. So, to provide correct input, we use a vocabulary file to handle the mapping. First let's load the vocabulary file, the code is as follows:

1.# this function converts text to tokens
2.def tokenize(text):
3.    """
4.    tokenize input text using GPT2 tokenizer
5.    
6.    Parameters:
7.      text, str - input text
8.    Returns:
9.      input_ids - np.array with input token ids
10.      attention_mask - np.array with 0 in place, where should be padding and 1 for places where original tokens are located, represents attention mask for model 
11.    """
12.    
13.    inputs = tokenizer(text, return_tensors="np")
14.    return inputs["input_ids"], inputs["attention_mask"]

Among them, eos_token is a special token, which means that the generation has been completed. We store the index of this token to use this index as padding at a later stage.

1.eos_token_id = tokenizer.eos_token_id

Define the Softmax layer. Since the results of the GPT-2 model reasoning are presented in the form of logits, we need to define a softmax function to convert the first k logits into a probability distribution, so as to select the most probable one when selecting the final text prediction result reasoning results.

1.import numpy as np
2.
3.
4.def softmax(x):
5.    e_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
6.    summation = e_x.sum(axis=-1, keepdims=True)
7.    return e_x / summation

Determine the minimum sequence length. The following code will reduce the probability of eos token appearing if the minimum sequence length is not reached. This will continue the process of generating the next word.

1.def process_logits(cur_length, scores, eos_token_id, min_length=0):
2.    """
3.    reduce probability for padded indicies
4.    
5.    Parameters:
6.      cur_length - current length of input sequence
7.      scores - model output logits
8.      eos_token_id - index of end of string token in model vocab
9.      min_length - minimum length for appling postprocessing
10.    """
11.    if cur_length < min_length:
12.        scores[:, eos_token_id] = -float("inf")
13.    return scores

Top-K sampling. In Top-K sampling, we filter the K most likely next words and redistribute probability mass only among these K next words. 1.def get_top_k_logits(scores, top_k):

2.    """
3.    perform top-k sampling
4.    
5.    Parameters:
6.      scores - model output logits
7.      top_k - number of elements with highest probability to select
8.    """
9.    filter_value = -float("inf")
10.    top_k = min(max(top_k, 1), scores.shape[-1])
11.    top_k_scores = -np.sort(-scores)[:, :top_k]
12.    indices_to_remove = scores < np.min(top_k_scores)
13.    filtred_scores = np.ma.array(scores, mask=indices_to_remove,
14.                                 fill_value=filter_value).filled()
15.    return filtred_scores
  • Load the model and convert to OpenVINO IR format

 

Download the model.

1.from transformers import GPT2Tokenizer, GPT2LMHeadModel
2.
3.tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
4.pt_model = GPT2LMHeadModel.from_pretrained('gpt2')

The running result is shown in the figure below

Here we want to use the open source GPT-2 model in HuggingFace. We need to convert the original model in PyTorch format to ONNX, so that it can be optimized and reasoning accelerated in OpenVINO. We will use the HuggingFace Transformer library functionality to export the model to ONNX. See the HuggingFace documentation for more information on Transformer export to ONNX. The model file converted to ONNX format is then converted to the model file in OpenVINO IR format by the model optimizer MO of OpenVINO.

1.from pathlib import Path
2.from openvino.runtime import serialize
3.from openvino.tools import mo
4.from transformers.onnx import export, FeaturesManager
5.
6.
7.# define path for saving onnx model
8.onnx_path = Path("model/gpt2.onnx")
9.onnx_path.parent.mkdir(exist_ok=True)
10.
11.# define path for saving openvino model
12.model_path = onnx_path.with_suffix(".xml")
13.
14.# get model onnx config function for output feature format casual-lm
15.model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(pt_model, feature='causal-lm')
16.
17.# fill onnx config based on pytorch model config
18.onnx_config = model_onnx_config(pt_model.config)
19.
20.# convert model to onnx
21.onnx_inputs, onnx_outputs = export(tokenizer, pt_model, onnx_config, onnx_config.default_onnx_opset, onnx_path)
22.
23.# convert model to openvino
24.ov_model = mo.convert_model(onnx_path, compress_to_fp16=True, input="input_ids[1,1..128],attention_mask[1,1..128]")
25.
26.# serialize openvino model
27.serialize(ov_model, str(model_path))

Then load the model in OpenVINO IR format to the CPU for inference

1.from openvino.runtime import Core
2.
3.# initialize openvino core
4.core = Core()
5.
6.# read the model and corresponding weights from file
7.model = core.read_model(model_path)
8.
9.# compile the model for CPU devices
10.compiled_model = core.compile_model(model=model, device_name="CPU")
11.
12.# get output tensors
13.output_key = compiled_model.output(0)

In the case of GPT-2, we take batch size and sequence length as input and output batch size, sequence length and vocab size.

  • Define the text generation main function

Next, we define the main function for text generation

1.def generate_sequence(input_ids, attention_mask, max_sequence_length=128,
2.                      eos_token_id=eos_token_id, dynamic_shapes=True):
3.    """
4.    text prediction cycle.
5.
6.    Parameters:
7.      input_ids: tokenized input ids for model
8.      attention_mask: attention mask for model
9.      max_sequence_length: maximum sequence length for stop iteration
10.      eos_token_ids: end of sequence index from vocab
11.      dynamic_shapes: use dynamic shapes for inference or pad model input to max_sequece_length
12.    Returns:
13.      predicted token ids sequence
14.    """
15.    while True:
16.        cur_input_len = len(input_ids[0])
17.        if not dynamic_shapes:
18.            pad_len = max_sequence_length - cur_input_len
19.            model_input_ids = np.concatenate((input_ids, [[eos_token_id] * pad_len]), axis=-1)
20.            model_input_attention_mask = np.concatenate((attention_mask, [[0] * pad_len]), axis=-1)
21.        else:
22.            model_input_ids = input_ids
23.            model_input_attention_mask = attention_mask
24.        outputs = compiled_model({"input_ids": model_input_ids, "attention_mask": model_input_attention_mask})[output_key]
25.        next_token_logits = outputs[:, cur_input_len - 1, :]
26.        # pre-process distribution
27.        next_token_scores = process_logits(cur_input_len,
28.                                           next_token_logits, eos_token_id)
29.        top_k = 20
30.        next_token_scores = get_top_k_logits(next_token_scores, top_k)
31.        # get next token id
32.        probs = softmax(next_token_scores)
33.        next_tokens = np.random.choice(probs.shape[-1], 1,
34.                                       p=probs[0], replace=True)
35.        # break the loop if max length or end of text token is reached
36.        if cur_input_len == max_sequence_length or next_tokens == eos_token_id:
37.            break
38.        else:
39.            input_ids = np.concatenate((input_ids, [next_tokens]), axis=-1)
40.            attention_mask = np.concatenate((attention_mask, [[1] * len(next_tokens)]), axis=-1)
41.    return input_ids
  • Run the main function to generate text

Finally, let's enter a sentence, and then let the GPT-2 model generate a text based on it:

1.import time
2.text = "Deep learning is a type of machine learning that uses neural networks"
3.input_ids, attention_mask = tokenize(text)
4.
5.start = time.perf_counter()
6.output_ids = generate_sequence(input_ids, attention_mask)
7.end = time.perf_counter()
8.output_text = " "
9.# Convert IDs to words and make the sentence from it
10.for i in output_ids[0]:
11.    output_text += tokenizer.convert_tokens_to_string(tokenizer._convert_id_to_token(i))
12.print(f"Generation took {end - start:.3f} s")
13.print("Input Text: ", text)
14.print()
15.print(f"Predicted Sequence:{output_text}")

The running result is shown in the figure below:

It can be seen that even on our AI development board that costs less than 1,000 yuan, running the GPT-2 model for text generation is fully supported. It can also be seen that running generative workloads on the edge (such as the text generation we introduce here) is a completely achievable task, and being as close as possible to the place where the data is generated is also the key to the scalability and landing of artificial intelligence. The essential.

summary:

That's it for the whole process! Now start to follow the code and steps we provide, and try to optimize and accelerate GPT-2 with Open VINO on your edge device. whaosoft aiot  http://143ai.com

For more information about the Intel OpenVINOTM open source toolkit, including more than 300 verified and optimized pre-training models we provide, please click https://www.intel.com/content/www/us/ en/developer/tools/openvino-toolkit/overview.html

In addition, in order to facilitate everyone to understand and quickly master the use of OpenVINOTM, we also provide a series of open source Jupyter notebook demos. By running these notebooks, you can quickly understand how to use OpenVINOTM in different scenarios to achieve a series of tasks, including computer vision, speech and natural language processing. The resources of OpenVINOTM notebooks can be downloaded and installed on Github here: https://github.com/openvinotoolkit/openvino_notebooks.

 

 

 

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/130072298