Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
application/json Bearer authentication: Bearer {{API key}}.
Model ID
"gpt-4"
Conversation messages
Sampling temperature between 0 and 2. Higher values (e.g. 0.8) make output more random; lower values (e.g. 0.2) make it more focused and deterministic.
We generally recommend changing either this or top_p, not both.
0 <= x <= 2An alternative to temperature, called nucleus sampling: the model considers tokens whose cumulative probability mass is top_p. So 0.1 means only the top 10% probability mass is considered. We generally recommend changing either this or temperature, not both.
0 < x <= 1How many completions to generate for each prompt. Note: This can consume your token quota quickly. Use with care and set max_tokens and stop to reasonable values.
1 <= x <= 128Whether to stream partial progress. When enabled, tokens are sent as data-only server-sent events (SSE) as they become available; the stream ends with a data: [DONE] message.
Options for streaming responses. Only set when stream is true.
Up to 4 sequences where the API will stop generating further tokens. Returned text includes the stop sequence.
Maximum number of tokens to generate
If your prompt (prior messages) plus max_tokens would exceed the model context length, behavior depends on context_length_exceeded_behavior. By default, max_tokens is reduced to fit the context window instead of returning an error.
Maximum completion tokens
Positive values penalize new tokens based on whether they appear in the text so far, increasing the chance the model talks about new topics.
For mild repetition reduction, try roughly 0.1 to 1. For strong suppression, you can increase toward 2, but quality may drop. Negative values encourage repetition.
See also frequency_penalty for penalizing tokens by how often they appear.
Required range: -2 < x < 2
-2 <= x <= 2Positive values penalize new tokens based on their existing frequency in the text, reducing the chance of repeating the same line verbatim.
For mild repetition reduction, try roughly 0.1 to 1. For strong suppression, you can increase toward 2, but quality may drop. Negative values encourage repetition.
See also presence_penalty for a fixed penalty on tokens that appear at least once.
Required range: -2 < x < 2
-2 <= x <= 2Modifies the likelihood of specified tokens appearing in the completion. Accepts a JSON object mapping tokens to bias values between -100 and 100. Mathematically, the bias is added to the logits before sampling. Exact behavior varies by model. For example, "logit_bias":{"1024": 6} increases the likelihood of token ID 1024.
Tools the model may call. Currently only functions are supported. Provide a list of functions the model can generate JSON arguments for.
See the function calling guide for more.
none, auto, required If specified, the system will try to sample deterministically so repeated requests with the same seed and parameters return the same result.
Reasoning effort (for models that support reasoning)
low, medium, high text, audio Penalty on repeated tokens to discourage or encourage repetition. 1.0 means no penalty and free repetition. Values above 1.0 penalize repetition. Values between 0.0 and 1.0 reward repetition. A balanced value is often around 1.2. The penalty applies to generated output and, in decoder-only models, to the prompt.
Required range: 0 < x < 2
Top-k sampling keeps only the k most likely next tokens and redistributes probability mass among them. k controls how many candidates are considered per step.
0 < x < 128Minimum relative probability for a token to be considered, compared to the most likely token.
0 <= x <= 1Whether to return log probabilities of output tokens. If true, returns logprobs for each output token in the message content.
Integer between 0 and 20: how many most likely tokens to return at each position, each with an associated log probability. If you use this, set logprobs to true.
0 <= x <= 20