The promise of 'mathematically guaranteed' — what it actually means here

Marketing copy makes the phrase do a lot of work. Inside the pipeline, the guarantee has a precise scope, a precise boundary, and an honest gap. This is where each one lives.

formal-methodssoundnessguarantees

The tagline says “mathematically guaranteed solutions”. It is a load-bearing phrase, and in a field where every other product page promises the moon, the right thing to do is be explicit about what is and is not guaranteed. This post unpacks the claim. There is a real guarantee at the centre of Savanty’s pipeline; it does not cover everything, and the parts it doesn’t cover are exactly the ones the repair loop and the suitability check exist to handle.

What is actually guaranteed

The guarantee is this: if Savanty returns an assignment, that assignment satisfies every integrity constraint in the ASP encoding that was given to Clingo. Equivalently, no returned assign(Var, Value) mapping can violate any of the :- rules that the LLM emitted. If the program contained :- assign(n1, C), assign(n2, C). (no two adjacent regions share a colour), then in any assignment Savanty returns, n1 and n2 will not share a colour. This is not a probability. This is not a usually. This is a property of Clingo’s stable model semantics.

The technical statement is that Clingo is sound for Answer Set Programming under the stable model semantics. Every model it returns is an answer set; every answer set, by definition, satisfies every rule in the program, including every integrity constraint. The proof is in the Potassco theory papers and it is not novel work; it is the property that makes ASP a usable formalism in the first place. The reason we are pointing at it explicitly is that “mathematically guaranteed” is sometimes used to mean “high probability” or “we tested it” or “the LLM says it should work”, none of which would be honest. The guarantee here is the textbook soundness property of the underlying solver.

Clingo is also complete for finite-domain programs within the search budget you give it. If a model exists and the timeout is not hit, Clingo finds one. If it returns UNSAT, no model exists. There is no third option in which a model exists and Clingo missed it. This matters for the infeasibility-reporting case: when Savanty tells you “no valid schedule exists given these requirements”, that statement is not “we tried for a while and gave up”. It is a positive claim with a proof structure behind it.

What is not guaranteed: the modelling gap

The honest part of this is the part the marketing copy elides. The guarantee covers everything from the ASP encoding onwards. It says nothing about whether the ASP encoding faithfully represents what you actually said in English. That step — the translation from English to constraints — is performed by an LLM, and LLMs make mistakes.

There are a few species of mistake that can happen at this step.

The LLM can omit a constraint. You said “nurses cannot work three nights in a row”, the model heard it but didn’t emit the corresponding :- assign(N, D1, night), assign(N, D2, night), assign(N, D3, night), consecutive(D1, D2, D3). rule. Clingo will still return a valid model with respect to the rules it received; that model is a valid schedule for a fictional version of your problem where three-night runs are allowed. The guarantee holds for the encoding. It does not hold for your actual intent.

The LLM can over-constrain. You said “each shift needs one nurse”, the model emitted “each shift needs exactly one nurse and at most one nurse” and “each nurse needs at least one shift”, and now the program is UNSAT because there are fewer shifts than nurses and your stated requirement was that excess nurses sit idle. The repair loop catches some of this — the unsatisfiable core will point at the spurious constraint — but only if the model recognises which constraint is hallucinated. If it thinks the hallucinated constraint is correct, it will weaken something genuine instead and produce a valid model of a different problem.

The LLM can mistranslate values or quantifiers. “Up to four shifts per week” can come back as :- 4 < #count{ S : assign(N, S, _) }. (correct) or as :- 4 != #count{ S : assign(N, S, _) }. (wrong: now everyone must work exactly four). The first encoding has a guarantee. The second encoding has a guarantee. The two encodings differ. Only one is what you asked for.

These are all real failure modes. The repair loop addresses some of them — unsatisfiable cores localise contradictions, and the not_suitable branch routes continuous problems away before they get an encoding at all. None of these mechanisms can fix a quiet omission. If the LLM doesn’t translate a constraint, nothing downstream knows to look for it.

Where the guarantee starts and stops

In practice this gives you a clean mental model. Everything downstream of the JSON {facts, rules, optimize} that the LLM emits is covered. Everything upstream is best-effort, instrumented with a typed feedback loop, and benefits from inspectable outputs that let you verify by hand.

The “inspectable outputs” point matters. Savanty’s ProblemSolverResult returns the generated asp_code field next to the solution. The intent is that you can read the encoding. Not in production for every solve, but during development, during onboarding, and any time a result surprises you. The encoding is small, the syntax is regular, the integrity constraints are literal translations of your stated requirements. If a constraint you mentioned doesn’t appear in the rules list, you found a translation bug — and you can fix it by editing the description and re-solving, often with no second LLM round needed if the description was the source of ambiguity.

What this means in practice

If you are evaluating Savanty against a workflow where a human currently translates business rules into a CP-SAT or OR-Tools model and verifies them by hand, the relevant question is whether the LLM + repair loop produces encodings that match the human’s encoding often enough, and surfaces its failures visibly enough, that the time saved on the easy cases is not eaten by the hard ones. That is an empirical question and Savanty ships a benchmark harness (benchmark/) precisely so you can answer it on your own problem family. Graph colouring, scheduling, knapsack-style allocation — the included fixtures cover the canonical cases. Your domain is probably more constrained than any of them, so run your own.

If you are evaluating Savanty against a workflow where someone is currently asking ChatGPT to “schedule four nurses” and pasting the answer into a spreadsheet, the question is different. There the comparison is between a hand-verified ASP encoding (Savanty’s output, with a soundness guarantee from there) and an unverified LLM-emitted schedule (the chatbot’s output, with no soundness property at any layer). The relevant guarantee is the difference between “a schedule that looks right” and “a schedule that provably satisfies the encoded constraints”. Both still depend on the constraints being right, but only one of them gives you a place to check.

The honest summary

The guarantee is real, narrow, and load-bearing. It is the soundness and completeness of Clingo applied to the ASP program the LLM produced. It is not a guarantee that the LLM produced the right program. The repair loop is the mechanism for catching the cases where it did not — under the three failure types Savanty can detect — and the encoding is exposed in the API so the cases the loop can’t catch are still humanly auditable.

That is what “mathematically guaranteed” means here. Not a slogan. A property of a specific algorithm with a specific scope, deployed where the property is useful, and surrounded by tooling that makes the boundary of the guarantee visible. It is the boundary, not the slogan, that does the work.