Model name to use. See the model catalog on TokensMind AI for available names
Prompt used to generate the completion (may be a string, array of strings, array of tokens, or array of token arrays).
Defaults to 1. Generates best_of completions on the server and returns the "best" one (highest per-token log probability). Cannot be used with streaming.
When used with n, best_of is the number of candidate completions and n is how many to return. best_of must be greater than n.
Note: This can consume your token quota quickly. Use with care and set max_tokens and stop reasonably.
Maximum number of tokens to generate. The sum of prompt tokens and max_tokens must not exceed the model's context length.
Sampling temperature; default is 1, between 0 and 2. Higher values (e.g. 0.8) make output more random; lower values (e.g. 0.2) make it more focused and deterministic.
We generally recommend changing either this or top_p, not both.
0 <= x <= 2An alternative to temperature, called nucleus sampling: the model considers tokens whose cumulative probability mass is top_p. So 0.1 means only the top 10% probability mass is considered. We generally recommend changing either this or temperature, not both.
0 < x <= 1Note: This parameter can generate many completions and may consume your token quota quickly. Use with care and set max_tokens and stop to reasonable values.
0 < x < 128Whether to stream tokens. Defaults to false. When enabled, tokens are sent as data-only server-sent events (SSE), ending with a data: [DONE] message.
Up to 4 sequences where the API will stop generating further tokens. Returned text includes the stop sequence.
Streaming options. Only set when stream is true.
If specified, the system will try to sample deterministically so repeated requests with the same seed and parameters return the same result.
Default 0. Positive values penalize new tokens based on their frequency in the text so far, reducing repetition.
For mild repetition reduction, try roughly 0.1 to 1. For strong suppression, you can increase toward 2, but quality may drop. Negative values encourage repetition.
See also presence_penalty for a fixed penalty on tokens that appear at least once
-2 < x < 2Default 0. Positive values penalize new tokens based on whether they appear in the text so far, encouraging new topics.
For mild repetition reduction, try roughly 0.1 to 1. For strong suppression, you can increase toward 2, but quality may drop significantly. Negative values encourage repetition.
See also frequency_penalty for frequency-based penalization
-2 < x < 2Penalty applied to repeated tokens to discourage or encourage repetition. 1.0 means no penalty and free repetition. Values above 1.0 penalize repetition. Values between 0.0 and 1.0 reward repetition. A balanced value is often around 1.2. On decoder-only models, the penalty applies to both the prompt and the generated output.
0 < x < 2Top-k sampling keeps only the k most likely next tokens and redistributes probability mass among them. k controls how many candidates are considered per step.
1 < x < 128Minimum relative probability for a token to be considered, compared to the most likely token.
Defaults to null. Modifies likelihood of specified tokens in the completion. Accepts a JSON object mapping token IDs to bias values from -100 to 100.
Returns log probabilities for the top logprobs output tokens at each step, including the probability of the selected token. For example, if logprobs is 5, the API returns the top 5 tokens' logprobs per step.
The maximum value for logprobs is 5.