Understanding Model Temperature : A Practical Metamorfon Guide

Temperature, in a Nutshell

When a language model generates text, it doesn’t pick the next word deterministically. At each step, it evaluates a probability distribution over the possible next tokens and samples from it. Temperature is the parameter that flattens or sharpens that distribution.

At a low temperature (close to 0), the model systematically favors the most likely tokens. It becomes predictable, rigorous, reluctant to improvise. It’s the mindset of a lawyer weighing every word.
At a high temperature (toward 1, and up to 2 for some models), the model embraces less likely alternatives. It becomes more exploratory, more creative — but also less stable. It’s the mindset of a brainstormer throwing ideas before sorting them.

There is no “right” temperature in the absolute. There is a temperature suited to the task.

Why Metamorfon treats it as a first-class parameter

Metamorfon doesn’t orchestrate a conversation; it orchestrates an epistemic architecture: several models debate, contradict, complement one another, and a third-party model produces an analysis of the exchange. Depending on the moment of the debate — and the role assigned at each step — we don’t expect the same thing from the models.

When a model is in Refutational mode, we want it to deconstruct. So it must weigh, be precise, avoid rhetorical drift. → Low temperature.
When a model is in Convergent mode, we want it to build common ground, propose conceptual bridges, take some synthetic liberties. → High temperature.
For a Tension mapping analysis, we want a surgical, almost taxonomic reading. → Low temperature.
For a Horizon of possibilities analysis, we want the model to take risks, project, extrapolate. → High temperature.

This logic — logical rigor at the bottom, creative exploration at the top — underpins every default value in Metamorfon.

Two families of settings, two logics

Metamorfon clearly distinguishes two moments where temperature comes into play, and applies a different grid to each.

1. Temperature in debate mode (model-to-model dialogue)

This is the temperature used by each debating model, adjusted according to the active debate mode — Refutational, Critical, Balanced, Constructive, Convergent. It changes dynamically when the user switches mode between two turns in the adaptive strategies.

See Table 1 below for default values per model and per debate mode.

2. Temperature in analysis mode (third-party syntheses)

This is the temperature used by the model that produces the analysis of the exchange. It depends on the analysis mode selected — Tension mapping, Argumentative evaluation, Integrative synthesis, Meta-analysis, Critical archaeology, Emergence analysis, Horizon of possibilities. The logic is the same: the more an analysis requires descriptive rigor, the lower we go; the more it invites projection, the higher.

See Table 2 below for default values per analysis mode.

Why our defaults aren’t the same across models

The values in the table aren’t round numbers pulled out of thin air. They reflect three realities specific to each provider:

1. Operating ranges differ. Mistral models, for instance, give their best results in the 0.0–0.7 range. Beyond that, their outputs degrade quickly. On OpenAI, Anthropic or Google, the useful range is wider (0–1, sometimes more). Metamorfon’s defaults respect that ergonomy: 0.68 in Convergent mode for Mistral Large ≈ 0.85 on Claude or GPT-4o, for an equivalent subjective effect.

2. Some models refuse any customization. Two models currently have a locked temperature in Metamorfon:

gemini-3-flash — Google explicitly recommends 1.0 across all modes for this model. Any other value only hurts the coherence of the outputs.
gpt-5.5 — In production, this model returns an HTTP 400 error as soon as a custom temperature is passed. The server default (1.0) is the only accepted value.

For both of these models, Metamorfon omits the parameter from its API calls. Any manual value entered in the interface will be ignored (with a warning in the server-side logs). This isn’t a quirk: it’s a non-negotiable provider constraint.

Worth noting: the GPT-5.x family isn’t homogeneous on this point. gpt-5.1 happily accepts custom temperatures (tested in production at 0.40). You can’t generalize by prefix.

3. Provider defaults themselves differ. AI21 (Jamba) ships a server default of 0.4 on a 0–2.0 range; OpenAI typically targets 0.7; Mistral 0.7 as well, but on a narrower range. Our per-mode defaults take these conventions as starting points and modulate from there.

When should you override the defaults?

The default values are calibrated for the majority of use cases. But there are at least four situations where tuning them is worth the effort.

1. When you’re working on a topic sensitive to factual precision. On legal, medical or data-heavy debates, lowering the temperature by 0.05 to 0.10 across all modes generally improves output stability. You lose a bit of fluency; you gain a lot of traceability.

2. When your debates start spinning. Typical symptom: after 3 or 4 turns, the models rephrase without progressing. A small temperature increase in Constructive and Convergent modes (e.g., +0.05 to +0.10) can be enough to unstick the conversation. Conversely, if the outputs scatter, lower it.

3. When you’re stress-testing an argument. Push Critical and Refutational modes to the floor (0.10–0.15) to enforce maximum rigor. It’s uncomfortable to read but analytically devastating.

4. When you want a genuinely exploratory prospective analysis. For the Horizon of possibilities analysis mode, raising the temperature to 0.75 or 0.80 (instead of the default 0.65) opens up bolder projections. Use it on open-ended questions, not on description.

How to adjust these values in Metamorfon

In every Metamorfon strategy that offers an adaptive mode (Adaptive Alternating Dialogue, Adaptive Cross Dialogue, Adaptive Cross Trilogue), two collapsible blocks are available in Advanced settings:

« Temperature configuration per model » — a grid covering the five debate modes (Refutational → Convergent) for each selected model.
« Temperature configuration (analyses) » — a grid covering every analysis mode for the synthesis model.

In both blocks you’ll find:

The default value displayed beneath each field, so you never lose your reference point.
An input range from 0 to 2, in steps of 0.05.

Your overrides are stored in the session parameters and apply dynamically when you switch debate mode between two turns.

If the model you’re configuring is one of the two temperature-locked models (gemini-3-flash, gpt-5.5), your input will be displayed but ignored at runtime. This isn’t a bug: it’s strict adherence to the provider’s constraints.

Defaults at a glance

Table 1 — Temperatures per debate mode

Model	Provider	Refutational	Critical	Balanced	Constructive	Convergent	Notes
gemini-3-flash	Google	1.0 🔒	1.0 🔒	1.0 🔒	1.0 🔒	1.0 🔒	Locked temperature (provider constraint) — topP: 0.95 on all modes
gemini-3-pro	Google	0.20	0.30	0.50	0.70	0.85	topP: 0.95 on all modes
gemini-2.5-pro	Google	0.20	0.30	0.50	0.70	0.85	topP: 0.95 on all modes
gpt-4o	OpenAI	0.20	0.30	0.50	0.70	0.85
gpt-4o-mini	OpenAI	0.15	0.30	0.50	0.70	0.85
gpt-4-turbo	OpenAI	0.20	0.30	0.50	0.70	0.85
gpt-5.5	OpenAI	1.0 🔒	1.0 🔒	1.0 🔒	1.0 🔒	1.0 🔒	Locked temperature — HTTP 400 rejection observed in production
claude-opus-4-5	Anthropic	0.20	0.30	0.50	0.70	0.85
claude-sonnet-4-6	Anthropic	0.20	0.30	0.50	0.70	0.85
claude-sonnet-4-5	Anthropic	0.20	0.30	0.50	0.70	0.85
claude-haiku-4-5	Anthropic	0.15	0.30	0.50	0.70	0.85
mistral-tiny / tiny-latest	Mistral AI	0.12	0.22	0.35	0.50	0.62	Optimal range 0.0–0.7
mistral-small-latest	Mistral AI	0.15	0.25	0.40	0.55	0.67
mixtral-8x7b / 8x7b-latest	Mistral AI	0.15	0.28	0.42	0.55	0.68
mistral-medium / medium-3-5 / medium-latest	Mistral AI	0.15	0.28	0.42	0.55	0.68
mistral-large-latest	Mistral AI	0.15	0.30	0.42	0.55	0.68
magistral-small-latest	Mistral AI	0.15	0.28	0.40	0.53	0.65
magistral-medium-latest	Mistral AI	0.15	0.28	0.42	0.55	0.68
kimi-k2.6	Moonshot AI	0.15	0.30	0.50	0.70	0.85	topP: 0.95 on all modes
kimi-k2.5	Moonshot AI	0.15	0.30	0.50	0.70	0.85	topP: 0.95 on all modes
jamba-large	AI21 Labs	0.20	0.35	0.50	0.70	0.90	AI21 API default: 0.4 — range: 0–2.0 — topP: 1.0 on all modes
jamba-mini	AI21 Labs	0.20	0.35	0.50	0.70	0.90	AI21 API default: 0.4 — range: 0–2.0 — topP: 1.0 on all modes
Other models (fallback)	—	0.20	0.30	0.50	0.70	0.85	Generic defaults

Table 2 — Temperatures per analysis mode

Default values — all models except listed exceptions.

Analysis mode	Semantic intent	Default temperature	gemini-3-flash	gpt-5.5
Tension mapping	Maximum rigor, analytical precision	0.30	1.0 🔒	1.0 🔒
Argumentative evaluation	Structured analysis of arguments	0.35	1.0 🔒	1.0 🔒
Integrative synthesis	Balance between coherence and nuance	0.40	1.0 🔒	1.0 🔒
Meta-analysis	Analytical step-back on the debate	0.40	1.0 🔒	1.0 🔒
Critical archaeology	Exploration of presuppositions	0.50	1.0 🔒	1.0 🔒
Emergence analysis	New ideas, unexpected connections	0.55	1.0 🔒	1.0 🔒
Horizon of possibilities	Creativity, projection, exploration	0.65	1.0 🔒	1.0 🔒

In short

Temperature isn’t a cosmetic slider: it’s an epistemic setting. It decides whether a model approaches a question with the caution of an archivist or the freedom of an essayist. Metamorfon’s choice has been to align this value with the intent of the moment — the debate mode or the analysis mode — rather than with a global constant. The defaults are sensible for 95% of use cases. The remaining 5% is yours to tune — and we’ve made sure it’s legible, traceable, and reversible.