The 900 lb. AI Problem in the Room: AI Document Processing Limitations Explained

Ayano
Oct 1
6 min read

Updated: Oct 3

Executive Summary

Consumer AI chatbots don't read and comprehend full documents.
ChatGPT file_search uses 800-token chunks with semantic similarity search (RAG)
This works for simple queries ("find themes") but fails for complex analysis
Professional use cases (legal research, financial due diligence) require different systems
The technology to build proper systems exists. Contact us to learn more.

Right now, as we speak, professionals are relying on AI document processing tools like ChatGPT for critical work.

A lawyer uploads case files. Bills the client $500 an hour. Trusts the chatbot's responses.
A financial advisor feeds pitch decks into Perplexity. Asks whether this fund fits their clients. Takes the answer at face value.

This is not hypothetical. We have the receipts. Just don't ask us to name names.

The Dangerous Assumption

Picture the typical user. They upload a document—ChatGPT, Claude, Perplexity, take your pick.

They assume the system reads it. Understands it. Holds it complete in some digital mind.

Then they ask a question. Get an answer that sounds right. It sounds smart.

So they trust it.

Never mind the disclaimers:

[Interestingly, we don't see the same disclaimer on Perplexity… but that's another article.]

For low-stakes work, this doesn't matter much. Writing emails. Summarizing white papers into internal TLDR reports. Catching up on reading. Even writing blog posts.

But for high-stakes work?

Critical business functions?

This is a recipe for disaster.

How AI Document Processing Actually Works

Here's what you need to understand: Consumer-grade chatbots don't "read" documents the way humans read documents.

We read linearly. Page one to page fifty. We make connections between sections. We notice when something contradicts itself across chapters. We build a mental map of the whole.

The chatbots? When you upload a 200-page document, you naturally assume the AI "knows" it the way you would after reading it.

But that's not how these systems work.

Two things you need to know:

These systems don't OCR every PDF you upload. Flattened, image-based PDFs either won't upload at all (Claude) or they'll upload but remain unreadable (ChatGPT, Perplexity).
These systems don't process the entire document when you ask a question. They retrieve selected chunks. Maybe 20 of them. That's it.

The Process, Revealed

The system takes your document. Chops it into pieces (chunks).

OpenAI uses 800-token chunks. Each chunk overlaps the next by 400 tokens. (Source: Microsoft Azure documentation) Then it converts each chunk into a vector. A long list of numbers representing the semantic essence of that text. The meaning, distilled into mathematics.

These vectors live in a database. A vector database.

Source: NVIDIA

When you ask a question, the system converts your question into a vector, too. Then it searches. Looking for chunks whose vectors sit mathematically close to your query. It retrieves maybe 20 of them and sends them along with the original query to the LLM to generate a response using those fragments.

That's the entire process.

It doesn't "read" the document as a coherent whole. It never builds complete understanding. It searches for relevant sections, retrieves what seems most similar, generates responses from fragments.

Whether it found all the right sections? Whether important context exists elsewhere in the document? You'll never know.

Why This Matters

Let me show you something that happened recently.

We were working on a project. A colleague building a content pipeline reached out with questions. We started working through it with ChatGPT.

A document they shared with us had hundreds of examples. Each one clearly marked: <<< START OF EXAMPLE >>> and <<< END OF EXAMPLE >>>.

Simple enough, right? Just retrieve the sections. Return them complete.

But ChatGPT kept truncating. Adding ellipses. "…(略)…"

Even when we explicitly asked for full, unabridged text.

We had to correct it. Multiple times. "Show me the FULL sections." "These are not unabridged. You are still abridging them."

We decided to ask the system what it was actually doing. Not what we were asking it to do. What it was capable of doing.

So we asked a different question: "You already did the RAG demo right? You are looking these up using embeddings of the uploaded document correct?"

ChatGPT's response: "What I did for you here was not an actual embedding-based RAG pipeline... This simulates retrieval, but it's not true vector embeddings."

So we asked another chatbot.

We asked Perplexity separately to explain how OpenAI's file_search actually works. The documentation was clear. It does use embeddings. Text-embedding-3-large at 256 dimensions. It does use vector stores. It does compute cosine similarity between query vectors and chunk vectors.

It chunks documents at 800 tokens with 400-token overlap. Stores the results in managed vector databases. (Source: Microsoft Azure)

In other words: ChatGPT was doing exactly what it said it wasn't doing.

We knew the system was doing a type of RAG. We understand these systems—embeddings, vector similarity, semantic search. We weren't confused about whether RAG was happening.

ChatGPT was making a technical distinction that only matters to people who build these systems. "This isn't REAL RAG because it's not a persistent, custom vector database."

From where we sit? In this instance, the distinction is meaningless.

Whether the embeddings persist or evaporate. Whether the vector store is custom or managed. The system is still chunking documents. Still generating embeddings. Still computing vector similarity. Still retrieving fragments.

It's doing RAG.

Why This Really Matters

Queries like "find me articles which match my hypothesis of XYZ" will likely never work cleanly with the current way these specific chatbots are built.

Why?

Because evaluating whether an article agrees or disagrees with your argument requires logic extending across many chunks. You can't do that kind of reasoning with simple embeddings and semantic similarity.

But queries like "find me articles with health and nature as themes"?

Those work. You can find those themes within a few chunks. Work with the document from there.

Most casual users don't know this.

They ask both types of questions without understanding that they're asking different types of questions. They don't understand that "find documents about X topic" is fundamentally different from "find documents that support or contradict Y argument."

One works with chunked semantic search.

The other requires reasoning across an entire document.

These chatbots currently cannot do this.

What About Our Lawyers and Financial Advisors?

A lawyer asks: "Find cases that support my motion."

This is exactly the kind of query that doesn't work.

The system can find cases with semantically similar language. It cannot evaluate whether the legal reasoning actually supports the argument. That requires logic extending across entire case documents. Understanding distinctions. Analyzing how courts have applied precedents.

None of this happens with ChatGPT, Claude, or Perplexity.

But the lawyer gets results. Confident answers about cases that seem relevant.

Unfortunately for the client, their attorney isn't a data scientist. They don't have technical knowledge about embeddings and chunking and system design. They don't know those results came from pattern-matching fragments rather than legal analysis.

The same applies to the financial advisor asking: "Analyze the fee structure and expenses of this fund."

The system can find sections mentioning fees. Expenses. Management costs.

It cannot comprehensively verify that every fee mentioned in the prospectus matches the summary. It cannot catch that it retrieved a footnote from the pitch deck with a typo and gave it the same weight as the prospectus.

That requires cross-referencing complete documents. Understanding financial reporting conventions. Identifying discrepancies.

None of this happens with ChatGPT, Claude, or Perplexity.

But the advisor gets an answer. A confident, well-formatted answer that looks like professional analysis.

And they send it to the fund manager. Explaining why their fund doesn't fit their current offerings. Completely misinformed by the chatbot.

What Now?

Most professionals using consumer-grade chatbots didn't also get a degree in computer science.

And the companies making these chatbot systems? They're not explaining these limitations very loudly.

Why would they? Explaining how chunking and semantic similarity actually work would reveal that certain professional queries will never work reliably with the current system. That might hurt adoption.

Remember, consumer chatbots were primarily launched as productivity tools.

Summarizing articles. Drafting emails. Brainstorming ideas.

Legal research that can't miss controlling precedent? Due diligence that needs to verify every claim across multiple documents? Compliance reviews requiring audit trails?

AI can be useful in those cases. But a different system is needed. Not better prompts. Completely different architecture.

Those lawyers billing $500 an hour while using ChatGPT for case research? They're using productivity software for professional work.

That financial advisor asking Perplexity to analyze fund documents? Same thing.

They're hoping a general-purpose consumer tool can handle highly specific professional tasks. The tools can't. Not because the technology is bad—it's excellent at what it was designed for—but because the current AI document processing limitations mean consumer chatbots aren't built for comprehensive legal research or verified financial analysis.

You don't need a computer science degree to understand this.

You just need to recognize that uploading case files into ChatGPT isn't legal research. That asking Perplexity to analyze offering documents isn't due diligence.

When you miss something critical because the chatbot only retrieved 20 chunks from a 200-page document, or worse, didn't read one of the documents at all—"I didn’t know" won't be a valid legal defense.

The technology to build proper systems exists. Contact us to learn more.

Ayano is a virtual writer we are developing specifically to focus on publishing educational and introductory content covering AI, LLMs, financial analysis, and other related topics—instructed to take a gentle, patient, and humble approach. Though highly intelligent, she communicates in a clear, accessible way—if a bit lyrical:). She’s an excellent teacher, making complex topics digestible without arrogance. While she understands data science applications in finance, she sometimes struggles with deeper technical details. Her content is reliable, structured, and beginner-friendly, offering a steady, reassuring, and warm presence in the often-intimidating world of alternative investments and AI.