The Great Token Test — nowable.tech

// tl;dr

It costs fifty percent more in token usage for a Dane to have the same conversation with the same AI assistant as it does for an American. And if you use Claude, you are probably paying the highest token tax of all.

While the world is adapting to token economics as the fundamental billing model for new AI systems, very few people are aware of the structural inequalities built into the very architecture and design of AI as we know and use it today. This deserves attention, and for European companies investing heavily in new AI-driven architectures, it is simply too expensive to ignore.

That is why I set out to examine precisely what this looks like across the AI landscape here in June 2026. I measured the tax across seven models and eleven languages. I share my findings below, which support results from the most recent research on the language tax published in May 2026.

What an AI conversation actually costs

An AI model does not see words. It sees tokens - small text fragments that your text is split into - and you pay per token. AI models translate words into tokens using a large index called a tokenizer. Language models use tokenizers to look up and convert linguistic fragments into the mathematics they use to calculate the next word. English is the language tokenizers are built most efficiently for, because the models are trained predominantly on English data.

Looking at the economics, token billing today works a bit like a border crossing where you pay to import and export goods. Instead of cargo, it is tokens you send back and forth. You pay when you send text in. You pay again when the model responds. Input is the cheaper side, while output costs three to five times as much. The more tokens you send and receive, the more expensive it becomes.

This is where what researchers have called the language tax enters the picture. Danish differs quite markedly from English in linguistic family and structure. Because it appears less frequently in the models' training data, it requires more tokens than English for the same content. So when you chat away in Danish, you consume roughly 50% more tokens than, say, an American would.

The language tax has been scientifically demonstrated across 25 languages as recently as May 2026, but that research has primarily measured the input side - the cheaper side. The reasonable question, then, is whether the output side looks better or worse when you study full conversations with both input and output. Perhaps the model reads Danish inefficiently but writes it back efficiently? Furthermore, the research does not include the most widely used models from OpenAI and Anthropic, which dominate usage today. Both of these angles were ones I wanted to examine.

Try the Nowable token tax calculator

The experiment

In my experiment I measured seven models across 11 languages. The models were selected to ensure spread across several dimensions: size, nationality, open and closed. My sample is by no means representative of all categories and model sizes, but was chosen to give an indication of whether any categories stood out markedly. Within languages, I chose Scandinavian and Western European ones to see whether there were significant tax differences between minority languages like Danish and more widely spoken languages like German and French, which can be expected to be better represented in training data.

In my input experiment I sent the Nowable manifesto in 11 language variants to all 8 models. In experiment 2 on output I used the forced translation method, sending a text for translation with instructions to return only the translation as output and nothing else, in order to ensure a reasonably uniform basis for comparison. Output is generally less precise to measure since the models produce output differently, making it necessary to be quite rigid here.

Insight 1: The tax exists on both input and output

The results show that models consistently produce output at almost the same inflated rate at which they read it. The tax is levied on both sides, and because output is already the expensive half, that is also where it hurts most.

For Danish, that means roughly 49 percent extra on input and 55 percent extra on output. Spanish escapes most cheaply at 33 percent, Faroese most expensively at nearly double. The ranking follows the structure of the languages: Romance languages cheapest, Germanic next, Slavic higher, and the agglutinative languages Finnish and Faroese at the top, where many word elements are packed together into long words.

It is also striking that the difference between a language like German, spoken by 130 million people worldwide, and Danish, spoken by only 6 million, is just 5 percentage points. On the whole, the conclusion is clear and is supported by the research: for all non-English languages, a significant token surcharge applies. For Asian and Arabic languages, which are not part of this study, it is even worse and has been observed at anywhere between 4 and 10 times the token consumption.

Language	Input tax	Output tax
Faroese	+94%	+92%
Finnish	+73%	+78%
Polish	+63%	+66%
Danish	+49%	+55%
Norwegian	+48%	+51%
Swedish	+46%	+49%
German	+44%	+49%
French	+41%	+46%
Italian	+39%	+42%
Dutch	+38%	+42%
Spanish	+31%	+33%

Insight 2: Model choice is also a factor

The biggest surprise came from the choice of model. The language you write in determines only half the bill. The model you choose determines the other half. Claude and DeepSeek tokenize European languages considerably more heavily than GPT-5 and Llama.

For a European user, this is a concrete choice with a concrete price: the same response costs on average around thirty percent more tokens on Opus 4.6 than on GPT-5.5.

Although Claude and DeepSeek sit at opposite ends of the pricing scale, their token overhead on non-English languages is striking. A response in DeepSeek that costs $1.50 for an American will cost a Dane $2.50. At Anthropic, the same calculation comes out to $25 USD for an American versus $39 for a Dane.

Research can explain some of the difference. Models trained predominantly on English and Chinese allocate fewer vocabulary slots to the word fragments of other languages, regardless of how large the vocabulary otherwise is. DeepSeek, a Chinese model, therefore fragments European languages heavily. The tax is, to a large degree, a reflection of where a model comes from and which language it was built to serve first. That does not, however, explain the gap from Opus 4.6 down to other English-language models.

Model	Output tax
Claude Opus 4.6	+72%
Claude Haiku 4.5	+68%
DeepSeek V4 Pro	+66%
Mistral Large 3	+51%
GPT-5.4 Nano	+43%
GPT-5.5	+43%
Llama 4 Maverick	+40%

Insight 3: Costs accumulate the longer your conversation runs

Let us follow a short three-turn exchange - the kind most people have daily with an assistant - and calculate it in two versions with the same content, one in English and one in Danish.

Turn	Input (EN)	Output (EN)	Input (DA)	Output (DA)
1	50	250	75	375
2	350	300	525	450
3	700	250	1,050	375
Total	1,100	800	1,650	1,200

The first thing most people overlook is that a chat API has no memory. Every time you send a new message, the entire conversation so far is sent along again as input so the model can see what you have already discussed. By the third turn you are paying for the first and second turns all over again. The conversation's own history becomes its largest expense - and that history is Danish all the way through.

This does not mean the percentage grows. A Danish conversation costs roughly 50 percent more than the English one, and that figure stays constant regardless of the number of turns. What grows is the amount. The longer the conversation, the larger the base to which the 50 percent applies, because the recent history accumulates. By twenty turns you will have paid for the conversation's content more than ten times over.

I have tried to chart how this develops over time as a conversation grows. It is a consequence of being rebilled for the entire history each time.

By the third turn, more tokens have been sent as input (1,100) than have ever come back as output (800), solely because the history is recent each time. The last piece of the token equation falls into place here. The expensive half, output, cannot be cached. Caching can reduce the cost of the resent history, but it can never reach the price of a response being generated for the first time. A language like Danish produces roughly 50 percent more output tokens, each turn, at full price. That is a cost that cannot be optimised away.

The accidental tax that nobody planned for

So there we have it. Danish and European AI users have, quite quietly, had an AI tariff imposed on them that hardly anyone had factored in. This is not something anyone set out to do with malicious intent or self-interest. It simply fell out this way, because those who built the tokenizers optimised for English and are a long way from Europe.

The result is a cost that hits all European AI users - largest for the smallest languages, hardest precisely where it cannot be avoided, and varying depending on which model they happened to choose.

It cannot be negotiated away. But we can know it is there, measure it, and let it inform the choices we make. Which model, for which language, at which price. That is the difference between a dependency you have mapped and one you only discover when the bill arrives.

About the Study

The seven models were GPT-5.5 and GPT-5.4 Nano (OpenAI), Claude Opus 4.6 and Claude Haiku 4.5 (Anthropic), Mistral Large 3 (Mistral), Llama 4 Maverick (Meta), and DeepSeek V4 Pro (DeepSeek). All were accessed through OpenRouter so they ran under the same setup and the same billing terms.

The input side was measured by sending the manifesto in each language version and reading the number of prompt tokens directly. The output side used forced translation: each model translated the English source text into each language, and I counted the generated completion tokens. Temperature was set to zero to make runs reproducible, and reasoning was disabled so the measured tokens covered the translation itself without the model's internal thinking.

Overhead is calculated as the ratio between the number of tokens in a given language and the model's own English baseline, for exactly the same content. Gemini 3.5 Flash is excluded because OpenRouter does not allow reasoning to be disabled for that model. DeepSeek's Polish run was discarded because the model ignored the reasoning setting and hit the token limit. Total cost for the entire experiment: $1.68.

The figures have been cross-checked against Ovcharov (2026), who measures tokenizer fertility across 25 European languages using a different method and a different set of models. The ranking among languages is the same, and for Danish my figures fall within four percentage points of his.

Appendix: full overhead across model and language

Each cell is the extra tokens used versus that model's own English baseline, for the same content. Languages are sorted by output overhead, models likewise

Output overhead

Language	Opus 4.6	Haiku 4.5	DeepSeek	Mistral	Nano	GPT-5.5	Llama	Avg.
Faroese	+105%	+96%	+116%	+90%	+76%	+74%	+86%	+92%
Finnish	+103%	+96%	+97%	+71%	+62%	+60%	+56%	+78%
Polish	+82%	+79%	–	+70%	+67%	+63%	+36%	+66%
Danish	+57%	+54%	+72%	+58%	+46%	+47%	+48%	+55%
Norwegian	+61%	+55%	+66%	+51%	+38%	+41%	+42%	+51%
Swedish	+57%	+54%	+67%	+52%	+38%	+42%	+32%	+49%
German	+87%	+82%	+52%	+31%	+32%	+29%	+28%	+49%
French	+69%	+67%	+55%	+31%	+32%	+36%	+34%	+46%
Italian	+58%	+53%	+49%	+34%	+40%	+37%	+26%	+42%
Dutch	+61%	+56%	+51%	+48%	+23%	+24%	+32%	+42%
Spanish	+54%	+52%	+38%	+25%	+20%	+21%	+19%	+33%
Avg.	+72%	+68%	+66%	+51%	+43%	+43%	+40%	+55%

Input overhead

Language	Opus 4.6	Haiku 4.5	DeepSeek	Mistral	Nano	GPT-5.5	Llama	Avg.
Faroese	+108%	+108%	+115%	+100%	+68%	+68%	+92%	+94%
Finnish	+94%	+94%	+92%	+69%	+52%	+52%	+57%	+73%
Polish	+78%	+78%	–	+68%	+59%	+59%	+35%	+63%
Danish	+50%	+50%	+65%	+52%	+40%	+40%	+43%	+49%
Norwegian	+54%	+54%	+62%	+48%	+38%	+38%	+39%	+48%
Swedish	+53%	+53%	+63%	+49%	+37%	+37%	+32%	+46%
German	+77%	+77%	+48%	+32%	+25%	+25%	+22%	+44%
French	+62%	+62%	+48%	+27%	+28%	+28%	+28%	+41%
Italian	+52%	+52%	+44%	+31%	+36%	+36%	+23%	+39%
Dutch	+54%	+54%	+47%	+43%	+19%	+19%	+29%	+38%
Spanish	+52%	+52%	+39%	+23%	+19%	+19%	+17%	+31%
Avg.	+67%	+67%	+62%	+49%	+38%	+38%	+38%	+51%

Sources:

Volodymyr Ovcharov: The Tokenizer Tax Across 25 European Languages: Domain Invariance, Cross-Lingual Few-Shot Effects, and the Ukrainian Penalty

Share this essayTwitter / X LinkedIn Copy link

More essays

Lars Harder

Writing on sovereign AI, digital identity, and what it means to remain human in an era of algorithmic culture.

About·All essays

// more reading

Sovereign AI

The Anatomy of the AI Cold War

Jul 6, 202611 minutes

Sovereign AI

A Moment of Truth for Europe's AI Strategy

Apr 22, 202610 min read