Part 2: Why Buying a T4 Became the Easiest Option.

Pure Math Editorial
May 6
6 min read

A candid “conversation” between our CEO, Sean Douglas and Pure Math Editorial. Pure Math Editorial is an all-purpose virtual writer we created to document and showcase the various ways we are leveraging generative AI within our organization and with our clients. Designed specifically for case studies, thought leadership articles, white papers, blog content, industry reports, and investor communications, it is prompted to ensure clear, compelling, and structured writing that highlights the impact of AI across different projects and industries.

Most people interact with large language models through consumer-facing tools like ChatGPT, Claude, Perplexity, or Gemini. You write a prompt, it gives you an answer.

What a lot of introductory users don’t realize is that those chat-based tools are just one kind of interface—an application layer—sitting on top of the underlying large language model (LLM). Or, more typically, the application is connected to multiple LLMs.

When you type in your prompt and the model responds—that’s an API call. It includes both your prompt and the model’s response. And every API call has a cost.

But these systems don’t charge by the prompt or the word—they charge by the token.

A token is a chunk of text. Sometimes it’s a whole word, sometimes it’s just a few letters or punctuation. For example, the word “investment” might be two tokens ("invest" + "ment"), but “internationalization” might break into four ("inter" + "national" + "iz" + "ation"). A sentence like “What is your outlook on the private markets?” might be roughly ten tokens.

Every model charges a different amount for those tokens. Rates can range from fractions of a cent per thousand tokens to significantly more, depending on the model. And token costs are different for the prompt and the response.

OpenAI’s GPT-4, for instance, might cost $30 to $60 per million tokens, depending on the configuration. Cheaper models like GPT-3.5 might be closer to $1 to $5 per million tokens. But the real cost depends on volume.

Consumer-grade users never see this because they’re using flat-rate plans—$20/month for ChatGPT Plus, for example—which gives you access to the various models they're making available. But what you’re really paying for is a prepackaged token allowance. Once you hit certain thresholds (which OpenAI doesn’t clearly publish), you might experience slower response times or usage limits.

That’s fine for individuals writing basic content or code. But in enterprise contexts—where teams are building tools and automated workflows—they’re often making direct API calls to the models, not using the chat interfaces at all.

And that’s when the pricing model changes. You’re no longer ‘chatting’—you’re developing automated systems that do things like:

Break down large PDFs or structured reports into smaller chunks optimized for model input,
Generate embeddings for each chunk to enable fast search and retrieval,
Run summarization or classification tasks across each section of a document,
Extract structured data or features for downstream workflows,
And often reprocess the same content multiple times as models are tuned or updated.

Each of those steps consumes tokens—and when you’re doing it across tens or hundreds of thousands of documents, those token counts add up quickly. You’re now in a world where the cost of experimentation alone can reach tens of thousands of dollars.

That’s the point where teams start asking serious questions about infrastructure tradeoffs, cost control, and whether it’s time to start hosting their own models.