The temperature parameter in large language models (LLMs) controls the randomness of predictions by scaling the logits before applying the softmax function, which is used to convert the logits into probabilities. Mathematically, it influences how the model samples from the probability distribution over possible next tokens.
Given a sequence of tokens, the model predicts the next token by calculating a probability distribution over the vocabulary. It does this by first computing the logits (raw scores) for each token in the vocabulary. These logits are then passed through the softmax function to convert them into a probability distribution.
The softmax function is defined as:
where Zi is the logit for token i, and P(xi) is the probability of selecting token i.
The temperature parameter T modifies the logits before they are passed to the softmax function. Mathematically, this is expressed as:
Low Temperature:
When the temperature lower, the logits are divided by a small number, making the differences between them more pronounced. This causes the softmax to become "sharper," meaning the model's predictions become more deterministic, favoring tokens with higher logits more strongly. As T approaches 0, the model's output converges to the most probable token, effectively making it behave like a greedy model.
High Temperature
When the temperature is higher, the logits are divided by a larger number, reducing the differences between them. This results in a "flatter" softmax distribution, where probabilities are more evenly spread out across the tokens. As a consequence, the model becomes more random and is more likely to pick less probable tokens, increasing creativity and variability in the output.
The temperature parameter thus acts as a control for the model's exploration-exploitation trade-off:
Lower temperatures lead to less exploration (more exploitation), with the model focusing on high-probability predictions.
Higher temperatures encourage more exploration, allowing for a wider range of potential outputs, which can be useful in creative tasks where diversity is valued.
By adjusting the temperature, users can fine-tune the model's behavior to suit the specific requirements of their application, whether they need consistent, reliable answers or varied, innovative responses.