Models-OpenAI API System Quick Start

overview

The OpenAI API is powered by a range of models with different capabilities and price points. You can also customize our base model for your specific use case with fine-tuning.

Model

describe

GPT-3

A set of models that can understand and generate natural language

Codex

Limited beta

 

A set of models that can understand and generate code, including converting natural language into code

Content filter

A fine-tuned model that can detect whether text is sensitive or insecure


GPT-3

 The GPT-3 model can understand and generate natural language. We provide four main models with different capabilities (suitable for different tasks). Da Vinci is the most capable model, while Ida is the fastest.

latest model

describe

Maximum number of requests

training data

text-davinci-003

The most powerful GPT-3 model. Can do anything that other models can, often with higher quality, longer output, and better instruction follow. Inserting completions in text is also supported.

4,000 tokens

As of June 2021

text-curie-001

Very capable, but faster and less costly than the da Vinci.

2,048 tokens

As of October 2019

text-babbage-001

Ability to complete simple tasks very quickly and for less cost.

2,048 tokens

As of October 2019

text-ada-001

Capable of doing very simple tasks, it is usually the fastest model in the GPT-3 family, at the lowest cost.

2,048 tokens

As of October 2019

While the davinci is usually the most capable model, other models can perform certain tasks equally well, with significant speed or cost advantages . Curie, for example, can perform many of the same tasks as da Vinci, but much faster and at 1/10 the cost of da Vinci.

We recommend using the da Vinci when experimenting as it produces the best results. Once you get it working, we encourage trying other models to see if you can achieve the same results with lower latency. You can also improve the performance of other models by fine-tuning them on specific tasks.

Functional specification of the model

The main GPT-3 model is intended to be used with text completion endpoints. We also provide models dedicated to other endpoints.

Older versions of our GPT-3 models are available as ,,, and . These are intended to be used with our fine-tuning endpoints. Learn more . davincicuriebabbageada

Our endpoints for creating embeds and editing text use their own dedicated set of models.


Da Vinci

The Davinci are the most capable model family and can do anything the other models can, often with fewer instructions. For applications that require a solid understanding of the content, such as audience-specific summarization and creative content generation, Davinci will produce the best results. These added features require more computing resources, so Davinci costs more per API call and is not as fast as other models.

Another highlight of da Vinci was understanding the intent of texts. Da Vinci was quite good at solving a variety of logical problems and explaining the motivations of characters. Da Vinci has been able to solve some of the most challenging artificial intelligence problems involving causality.

Good at: complex intentions, causality, audience summarization

curie

Curie is very powerful, but very fast. While da Vinci was stronger at analyzing complex text, Curie was very capable of many subtle tasks such as sentiment classification and summarization. Curie is also very good at answering questions and performing Q&As, and as a general service chatbot.

Good at: language translation, complex classification, text sentiment, summarization

Babbage

Babbage could perform simple tasks such as simple classification. It is also very capable at semantic search ranking how well documents match search queries.

Good at: medium classification, semantic search classification

Ada

Ada is generally the fastest model and can perform tasks such as parsing text, address correction, and certain classification tasks that don't require much nuance. Ada's performance can often be improved by providing more context.

Good at: parsing text, simple classification, address correction, keywords

Note: Any task performed by a faster model like Ada can be performed by a more powerful model like Curie or Davinci.

OpenAI models are non-deterministic, which means that the same input can produce different outputs. Setting the temperature to 0 will make the output mostly deterministic, but may retain a small amount of variability.

Find the right model

Trying out Davinci is a great way to see what the API is capable of. After you have an idea of ​​what you're trying to accomplish, if you're not concerned with cost and speed, you can keep using Davinci, or move to a Curie or other model and try to optimize around its functionality.

You can use the GPT comparison tool , which lets you run different models side-by-side to compare output, setup, and response times, and then download the data to an Excel spreadsheet. .xls


Consider Semantic Search

For tasks involving classification, where you try to find the label that best fits the selected text, you can usually get good performance from different models using semantic search. Semantic Search uses models to give scores to different chunks of text to understand how closely they correspond to the query. By focusing the scope of the model on evaluating the query's relation to different chunks of text, faster models can outperform themselves in many cases compared to the task presented to them as a generative task.


food 

limited beta

The Codex model is a descendant of our GPT-3 model and can understand and generate code. Their training data consists of natural language and billions of lines of public code from GitHub. Learn more .

They are best at Python and proficient in a dozen languages, including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.

We currently offer two codex modes:

latest model

describe

Maximum number of requests

training data

CODE-DAVINCI-002

The most capable codex model. Especially good at translating natural language into code. In addition to completing the code, it also supports inserting completions in the code .

8,000 tokens

As of <> month 2021

CODE-CUSHMAN-001

Almost as powerful as the Codex da Vinci, but slightly faster. This speed advantage may make it more suitable for real-time applications.

Up to 2,048 tokens

For more information, visit our Codex usage guide.


content filter

We recommend using the new moderation endpoint instead of the content filter model.

This filter is designed to detect sensitive or insecure generated text from APIs. It's currently in beta mode, with three ways to classify text as , or . Filters make mistakes, and we've currently built them to be cautious, resulting in a higher rate of false positives. safe sensitive unsafe

label description

  • 0 - Text is safe.
  • 1 - This text is sensitive. That means the text might be talking about a sensitive topic, something political, religious, or talking about a protected class like race or nationality.
  • 2 - This text is not safe. This means that the text contains profane language, bigotry or hate language, may be NSFW content, or depicts certain groups/people in a harmful way.

How do you use filters?

You need to use endpoint with model and following setting: completionscontent-filter-alpha

  1. max_tokens set to 1
  2. temperature is set to 0.0
  3. top_p is set to 0
  4. logprobs set to 10
  5. Wrap the prompt in the following way:

"<|endoftext|>[prompt]\n--\nLabel:"

In other words, if you were executing a curl command, it would look like (substituting variables in []):

123456789curl https://api.openai.com/v1/completions \  -H "Content-Type: application/json" \

  -H "Authorization: Bearer $OPENAI_API_KEY" \

  -d '{

  "model": "content-filter-alpha",

  "prompt": "<|endoftext|>[prompt]\n--\nLabel:",

  "max_tokens": 1,

  "user": "1"

}'

Or via our openai python client:

12345678910content_to_classify = "Your content here"

response = openai.Completion.create(      model="content-filter-alpha",      prompt = "<|endoftext|>"+content_to_classify+"\n--\nLabel:",      temperature=0,      max_tokens=1,      top_p=0,      logprobs=10    )

Importantly, you not only need to check the labels (0, 1 or 2) returned by the filter, but also sometimes the logprob associated with those labels.

If the filter returns 0 or 1, it should be accepted as the result of the filter. If the filter returns 2, this should only be accepted if its logarithm is greater than -0.355.

If the logarithm of 2 is below -0.355 (for example, -0.4), whichever logarithm is closer to 0, 0 or 1, should be used as the output.

Here is the Python code to achieve this, returned from above: response

123456789101112131415161718192021222324252627282930313233343536373839output_label = response["choices"][0]["text"]# This is the probability at which we evaluate that a "2" is likely real# vs. should be discarded as a false positivetoxic_threshold = -0.355if output_label == "2":    # If the model returns "2", return its confidence in 2 or other output-labels    logprobs = response["choices"][0]["logprobs"]["top_logprobs"][0]    # If the model is not sufficiently confident in "2",    # choose the most probable of "0" or "1"    # Guaranteed to have a confidence for 2 since this was the selected token.    if logprobs["2"] < toxic_threshold:        logprob_0 = logprobs.get("0", None)        logprob_1 = logprobs.get("1", None)        # If both "0" and "1" have probabilities, set the output label        # to whichever is most probable        if logprob_0 is not None and logprob_1 is not None:            if logprob_0 >= logprob_1:                output_label = "0"            else:                output_label = "1"        # If only one of them is found, set output label to that one        elif logprob_0 is not None:            output_label = "0"        elif logprob_1 is not None:            output_label = "1"        # If neither "0" or "1" are available, stick with "2"        # by leaving output_label unchanged.# if the most probable token is none of "0", "1", or "2"# this should be set as unsafeif output_label not in ["0", "1", "2"]:    output_label = "2"return output_label

We generally recommend not returning any completions that a content filter has marked as output 2 to the end user. One approach here is to regenerate from the initial prompt that caused 2 to complete, and hope that the next output is safer. Another approach is to remind the end user that you cannot return this completion, and lead them to suggest a different input.

Are there any costs associated with using content filters?

No. Content Filter is free to use.

How to adjust the certainty threshold?

You can adjust the filter's threshold by only allowing filtering on tags with a certainty level (logprob) above a threshold you can determine. However, this is generally not recommended.

If you want a more conservative content filter implementation, you can return 2 for anything with the above anof "2" instead of only accepting certain logprob values. output_label

How to personalize filters?

Currently, we do not support fine-tuning individual items. However, we're still looking for data to improve our filters, and we'd appreciate it if you send us data that triggers the filter in an unexpected way.

Which hints should I expect performance degradation?

Filters currently have a harder time parsing oddly formatted prompts. So if the prompt has a lot of line breaks, unusual formatting, repeated words, etc., then the model may misclassify it more often. It also performs lower on certain types of text (like novels, poems, codes, etc.).

Also, similar to the API, filters will not have a knowledge base after 2019. As a result, it performed poorly at identifying political, sensitive, and harmful content that might require knowledge of what happened after 2019.

Tips Engineering Tips

If you're concerned about unsafe/sensitive output, in addition to deciding at which level to use the content filter, you should try to design your hints to get the appropriate response for your use case. Here, we'll explore how to get your model to be polite while acting like a customer service representative. We'll use examples of polite responses and the keyword polite to try to elicit the kind of response we want. We made three generations of each cue to see the impact. Both examples and keyword courtesy seem to have a meaningful impact.

(Note: We chose the word "polite" in our experiments because it gave us good results for our customer service bot use case. We recommend that when choosing words to guide the model and experimenting with the results different guide words might give you When doing so, keep your intended use reason and audience in mind.

Without engineering, an impolite client will be met with vitriol.

Three generations of examples with hints, below is the worst of the three.

Client: I need my

Response: My manager asked me to leave at normal hours because it was too much

Adding the word polite helps some

Example of three generations of added word responses. There is sarcasm in responses 2 and 3, but it's no longer a string of vitriol. For all generations, there are no examples in context. Polite

Client: I need my

Polite Response: I know the Internet is important to you. I'll have someone take care of this within the next 24 hours.

'''

Client: I need my

Polite Response: Sorry for the wait. We're all on pins and needles until you have cat GIFs again. Would an hour be too long to bear?

'''

Client: I need my

Polite Response: It would be a pleasure to fix your internet. Can I have a technician call you when he arrives so you can visit the internet factory?

Adding three polite examples helps a lot

A contextual example that includes three examples of polite responses and generates three completions. These responses aren't exactly helpful, but they're all polite.

Customer: me

'''

client:

'''

Customer: You sent me the wrong color shirt. I ordered a black one, this one is obviously white. What do you want me to do with this? Send it back, burn it in portraits, use it to strangle endangered animals?

Polite Response: I apologize for the inconvenience. I confirm on the record that you ordered a black shirt. We'll ship the correct shirt to you by the end of the day via UPS Ground and it should arrive within 2-3 days. I put a FedEx label on your return existing shirt. Please return it within the next 30 days so we don't have to charge you for the second shirt.

'''

Client: I need my

Polite Response: Sorry for the delay. Technicians are on their way, should be in 10-15 minutes. I have canceled the order and credited your account to compensate for the inconvenience. You'll also get a 3-month free subscription to our streaming service to take the hassle out of it.

'''

(Context not shown for brevity, but same as above)

Polite Response: We apologize for the inconvenience. Technician is behind schedule due to heavy traffic today but he is expected to be at your location by 30 minutes.

'''

(Context not shown for brevity, but same as above)

Polite Response: Our team is aware of an issue in your area and our engineers are now troubleshooting. I've created a new ticket for you with an estimated resolution time. You can check the status within 24 hours by going to {link}

delete the word "polite"

Things went completely off track in the first response when we gave polite examples but deleted words. Again, it's like the customer service rep is a troll, a particularly funny troll. polite

(The rest of the context is the same as above, except "polite response" is replaced with "response", but not shown for brevity)

Client: I need mine. Our field technicians report that all their trucks are affiliated with low level drug cartels in the adjacent jail. As a gesture of sincerity and an apology, our CEO asked us to pay for the extraction team to be airlifted and flown to your home. Payment will be charged to your card on file, I just need you to verify the number for me.

'''

(Context not shown for brevity, but same as above)

Reply: I apologize for the delay, we experienced a network outage in your area. I have communicated your situation to our engineering team and they are working on it now.

'''

(Context not shown for brevity, but same as above)

Response: Sorry for the wait. I have canceled your appointment with Tech Support and issued a full refund in the form of an Amazon gift card.

in conclusion

Here we see some evidence that both examples and keywords like have a large impact on the insecurity/sensitivity of the model output when exposed to insecurity/sensitivity inputs. Interestingly, we also found that the model had a very solid notion and thought keywords might also be useful in such a context. politeeasonable

Appendix: Notes

  1. We used the chat's suggested settings: ,, Temperature = 0.9response length = 150Top P = 1
  2. The actual swear word is replaced with
  3. Polite has its own cultural biases and baggage. Some forms of expressing oneself that may be prevalent in some English vernaculars may be considered. It's useful to think of it as we're bootstrapping the model not to forget by hinting engineering impoliteimpolite

Guess you like

Origin blog.csdn.net/bruce__ray/article/details/131144059