How Tokenization Works
Before an LLM processes any text, a tokenizer splits it into tokens. Tokens are not words. They are chunks of text determined by the model’s vocabulary. Common words like “the” or “function” are single tokens. Uncommon words get split into pieces. “Tokenization” becomes something like [“Token”, “ization”]. Whitespace and punctuation consume tokens too.
Different models use different tokenizers, so the same text produces different token counts depending on the model. A 100-word paragraph might be 130 tokens on one model and 145 on another. Most API providers offer a tokenizer tool or endpoint that gives exact counts for their models.
Code is generally more token-dense than English prose. Variable names, brackets, indentation, and special characters all consume tokens individually. A heavily commented Python file uses more tokens than the same logic written without comments.
Why Token Counts Matter for Developers
Tokens set three hard constraints on every AI coding tool.
First, the context window. A 128K token window is the total budget for input and output combined. If your prompt and referenced files consume 100K tokens, the model has 28K tokens left for its response. Large codebases can exhaust the window before the model starts generating.
Second, output caps. Models have maximum output token limits per response. If the model hits this cap mid-function, the generated code is truncated. Some tools handle this with automatic continuation. Others just stop.
Third, cost. API-based tools charge per token consumed. Input tokens (your prompt and code) are cheaper than output tokens (the model’s response). A refactoring session that reads 50 files generates more input token cost than a single-file edit, even if the output is the same length.
Counting Tokens in Practice
You rarely need exact token counts, but rough estimates help you predict costs and avoid hitting context limits. The rule of thumb for English text is roughly 1 token per 0.75 words. For code, expect 1 token per 3 to 4 characters depending on the language.
Claude Code displays token consumption in the status bar. Cursor shows it in the usage panel. Most API dashboards break down token usage by request. If your tool does not surface token counts, the API provider’s billing page will.
Tokens in Different Programming Languages
Not all languages tokenize equally. Python, with its significant whitespace and relatively short keywords, tends to be more token-efficient than languages with more syntactic overhead. A 100-line Python function might tokenize to 600 tokens. The equivalent logic in Java, with its explicit type declarations, class wrappers, and getter/setter boilerplate, might consume 1,200 tokens or more.
Even comments cost tokens. A heavily documented file can use 30-40% of its token budget on comments alone. This creates a practical tradeoff. Well-commented code gives the model better context for generating accurate suggestions, but it also fills the context window faster. There is no universal right answer. The value of the comments depends on whether the model needs them to understand the code’s intent.
If you try feeding minified or obfuscated code to a model, expect high token counts. Minified code is particularly expensive in tokens because the tokenizer cannot match compressed variable names against its vocabulary. Each single-character variable name becomes its own token. This rarely matters in practice since developers do not typically feed minified code to AI tools, but it explains why you should exclude build output and vendor bundles from the tool’s indexing scope.
