Use prompt engineering to improve the accuracy of LLM in converting natural language into SQL

Large language models (LLMs) have demonstrated an exceptional ability to understand natural language cues and generate coherent responses. This opens up new possibilities for translating natural language into structured query languages ​​such as SQL. In the past, writing SQL queries required technical expertise, but LLM allows anyone to describe what they want in plain English and automatically generates the corresponding SQL code.

Prompt words are crucial when leveraging LLM to convert natural language into SQL queries. SQL query accuracy (execution accuracy) on the spider dataset was evaluated using two models of Anthropic (Claude instant 1.1, Claude 2) and using different hint engineering techniques. At a high level, there are several important considerations for prompt engineering when doing natural language to SQL conversion:

  • Use clear instructions – Simple and clear natural language prompts are easier for LLM to understand and translate.

  • Provide sufficient context - LLM needs to understand the user's requested query semantics as well as details about the database schema, such as table and column names.

  • Include Examples - Provides some examples of natural language and corresponding SQs that can help guide LLM in generating queries with the correct syntax.

  • Increase accuracy with RAG (Retrieval Augmentation Generation) – which retrieves natural language and SQL examples relevant to user queries.

    In this blog, I'll use my benchmark results to help you understand how each of the above hinting strategies affects the accuracy of natural language to SQL conversion.

    Instructions

    When using LLM to generate SQL from natural language, providing clear instructions in the prompt is critical to controlling the model's output. In my experiments with the Claude model, one strategy that worked well was to use XML tags in the prompts to annotate the different components. XML tags are like instructions that tell the model exactly how to format the SQL. For example, instructing the model to write queries between can reduce verbose output. Without this instruction, the Claude model can be very talkative. It tends to interpret SQL structures, which increases post-processing complexity and consumes more output tokens unnecessarily. Adding table tags between <table_schema> </table_schema> tells the model where the context begins and ends.

"""
Given an input question, use sqlite syntax to generate a sql query by choosing 
one or multiple of the following tables. Write query in between <SQL></SQL>.

For this Problem you can use the following table Schema:
<table_schema>
{table_info}
</table_schema>
            
Please provide the SQL query for this question: 
Question:{input}
Query: 
"""

Database Schema

You need to include the database schema as the context for LLM to generate SQL queries. Typically, a database schema includes table names, column names, column types, primary keys, and foreign keys that indicate table relationships. In experiments, I tried column name only, column name + foreign key, and column name + foreign key + column type. The results show that column name + foreign key has the best performance. Foreign keys are particularly useful when queries require joining tables. Adding column types doesn't help and can even produce worse results. This is a bit surprising, but I believe this behavior may not apply to all models.
Insert image description here

Few-Shot Example

Few-shot learning involves including a small number of examples in the prompt to demonstrate the desired mapping. For example, a prompt can contain 2-3 pairs of natural language queries and corresponding SQL statements. When adding a few examples, you should also take advantage of XML tags for clear instructions.

"""Given an input question, use sqlite syntax to generate a sql query by choosing 
one or multiple of the following tables. Write query in between <SQL></SQL>.

For this Problem you can use the following table Schema:
<table_schema>
{table_info}
</table_schema>

Below are three example Questions and the corresponding Queries. 
<example>
Question:{question_1}
Query:<SQL>{query_1}</SQL>
</example>
<example>
Question:{question_2}
Query:<SQL>{query_2}</SQL>
</example>
<example>
Question:{question_3}
Query:<SQL>{query_2}</SQL>
</example>
            
Please provide the SQL query for this question: 
Question:{input}
Query: """

My benchmark results show that by including 3 examples in the same database schema context, we achieve significant improvements. I encourage you to try more examples to further improve the results.
Insert image description here

RAG for Dynamic Few-Shot Examples

While a small number of examples can improve model performance, selecting the most relevant examples to include in hints can further improve model performance. Using RAG you can dynamically select a small number of examples and inject them into the prompt. For example, given the natural language question "Show the transaction type code with the fewest occurrences", the retrieval algorithm could include the most relevant examples such as "If the share count is less than 10, then show the transaction type description and date" or "Show the... Description" The transaction type code is "PUR". My results show that dynamic few-shot methods can significantly improve model performance. Notably, few-shot learning can close the gap between Claude instant 1.1 and its more powerful cousin Claude 2.
Insert image description here

in conclusion

Finally, prompt engineering is critical to optimizing LLM to convert natural language into accurate SQL queries. The results of my benchmarks using Claude on the spider dataset show tangible improvements through techniques such as including schema details, clear instructions, and adding a handful of examples. Additionally, retrieval enhancement hints dynamically select ideal few-shot examples to maximize relevance. Through thoughtful prompt design, we can guide LLM to better understand our natural language intent and unleash the power of SQL for everyone.

Guess you like

Origin blog.csdn.net/rkjava/article/details/135349743