Part 5: Turning AI into Real Products

Pure Math Editorial
May 23
9 min read

Updated: May 28

Up to this point, everything we’ve written has been about the backend system: the documents, the embeddings, the vector store, the model runners, the GPU, and the decisions that brought it all together. The backend is up, the stack is in place, and the GPU is doing real work.

But that’s just half of it.

Now how does someone interact with our fancy new database?

That’s the other half of it.

The general design and user experience is pretty consistent across most of the platforms most people are familiar with: ChatGPT, Claude, Perplexity, Manus, etc.

A prompt goes in, an answer comes out. Maybe it streams it’s ‘reasoning’. Maybe it cites sources. Maybe it offers a few follow-ups. Maybe you can upload documents. Or even talk to it or ask it to search the web.

The UI’s are clean, familiar, and fast enough that most people assume there must be a standard way to build this the stuff.

There isn’t.

Some might look to adapt an existing frontend like Open WebUI, but quickly find that it’s really intended for an end user testing out different models—it’s not meant to be a polished interface where your users can focus on getting the most out of your tailored model.

Some instead hope they can just write a thin wrapper around a model server like Ollama—only to quickly discover that it can’t be that thin. Model servers are low-level components. They don’t assume anything about things like: whether there will be multiple users, if they’ll be shown on a webpage, if they’ll need to look at files, or if the user will even see the response. All of those are decisions you need to make, and integrations you need to build.

Some may think, “that’s not a problem, I’ll just have an AI build it for me!”. AI has certainly gotten impressive, but it hasn’t yet eliminated the value of experience—of knowing how to assure the quality and security of the code that gets written. If I ask for it to build a chat interface for an LLM, I’ll get a response, but will it break when I refresh the page? Will it fail under load of too many concurrent streaming connections? How do I deploy it, and how does it scale? It can be easy to generate a lot of code with AI, but more code is harder to maintain, even for AI.

Every serious AI product you’ve used that looks simple, looks that way because the people designing it made it that way.

Remember when Claude couldn’t browse the web? Or what it was like before you could upload documents, or connect your Google drive? All of those features are built, from scratch, and to suit the intended users.

The tools we’ve been talking about so far are generalist products for the mass public. They’re built to be intuitive for anyone, regardless of background. They make smart tradeoffs to keep things fast, clean, and safe. And they do it well.

But those same tradeoffs break down fast when the use case isn’t a general audience.

If your goal is to ask a few open-ended questions about a document or brainstorm ideas with a model, generalist tools are more than good enough.

But if your goal, for example, is to run structured queries against a constantly evolving database of regulatory filings—if you need to track how a fund’s language around risk has changed over three versions of an ADV—then everything changes.

Having a frontend/UI that looks good isn’t enough, you need a user experience tailored for professionals.

That means knowing the difference between ADV Part 1 and Part 2—and knowing how to work with them together. The basic way of filtering structured data like AUM, fee schedules, and client segments only gets you so far. Nuance lives in the language: how a firm describes its strategy, how it discloses conflicts, how it talks about its clients, or itself. A system that can’t move between those layers—or can’t treat them as part of the same data—isn’t as useful.

What if you’re a PR agency looking for new clients? Could you scan the language in ADV filings or on their website and pick out firms whose messaging feels flat, defensive, or boilerplate—signals they might need help?

What if you’re launching a new investment product? Could you identify advisors who already allocate to similar strategies, work with the right client profiles, and describe their approach in a way that aligns with your offering?

What if you’re just trying to find a financial advisor? You shouldn’t need to be a financial expert just to find a financial expert—and you should be able to find one who aligns with your personality and preferences, not just your financial goals.

That means routing ambiguous prompts. Flagging disclosures that are missing—or just vague. Handling multi-step reasoning chains. Carrying structured memory between interactions. Inferring intent, not just recognizing terms. And enforcing the boundaries of what the system answers for users—and what it doesn’t.

There’s no off-the-shelf interface for enterprise AI platforms in the financial services space.

So we’re building one.

This time we spoke with Sean Douglas (CEO) and Tom Raleigh (CTO).

Pure Math Editorial:

Okay, so let’s say someone wants to build a serious platform. Not a prototype—an actual production system with multiple user accounts, shared threads, documents, logic. What do they need to think about first?

Sean:

The open-source tools like OpenWebUI work fine for local use, but it’s not meant to be integrated into a website, it’s like a personal dashboard.

Tom:

Yeah. If you start by just writing a simple frontend for Ollama, you’ll find you’re missing a lot. There’s nothing out of the box—if you lose your network connection or reload you have to resend the request, and you might get a different answer.

There's no persistence.

If you’re building a real user-facing product, you have to support things like: can I reload and still see the response? Can the stream pick up? How do I make sure it shows the full message, not just part of one?

Sean:

That’s something people typically don’t realize you have to think about until their users start complaining about losing chats. We built ours so if a user closes the tab while something is streaming, they can come back and the message continues properly.

Pure Math Editorial:

What about threading and user state?

Tom:

That’s another piece most people overlook. If you want to have users ask questions and then share those threads—or expose them publicly—you need to think about permissions early.

There’s nothing built in for that. You have to build the database. You have to build your authentication. And you have to explicitly track things like: which threads are public, which ones are tied to which users, who has permission to revoke access, things like that.

That’s not part of any existing system. It’s all work you have to do on top of whatever model interface you’re using. And it’s easy to get wrong if you haven't done this before.

Pure Math Editorial:

There’s also the question of extra features, right? Everyone’s adding cards, citations, research summaries…

Sean:

Yeah—and once you start streaming anything more complex than plain text or markdown, things get tricky.

Tom:

You have to separate out structured stuff from the stream. If you’re sending, say, a summary block, or a set of citations, or inline cards, you need to think about how that gets delivered—and in what order. Some of it might be streamed, some of it might be fetched in parallel. And the client has to make sense of all of it without showing things out of order.

That’s not something you get from the model. That’s API and event design work.

Pure Math Editorial:

What about routing and behavior? A lot of these products seem like they’re doing “chat,” but there’s clearly more going on.

Tom:

Yeah—this is where it stops being a single interaction with a chatbot and turns into an asynchronous “workflow”. The fact that it’s a workflow (a DAG - directed acyclic graph) means you now have to make decisions about what’s done in parallel, and what to do if a part of it fails.

You’re not just sending a message to a model. You’re parsing the input, figuring out what kind of task it is, and routing it accordingly. That might mean rewriting the prompt, splitting it up, or running it through a tool chain before it ever hits the model.

Sometimes you want to show that reasoning to the user—like a “here’s what I’m doing” step. That’s what makes the interface feel intelligent. So now you’re not just streaming text—it’s now an event stream, and it needs to not leave the UI in a bad state if a message gets dropped or sent in the wrong order. And again, it needs to not end up broken if the user reloads the page or loses internet in the middle of it.

Pure Math Editorial:

And the point here is that none of this is generic. It’s tied to the problem space.

Sean:

Right. Take a question like “Does the advisor invest in alternative investments?” That phrase might not appear anywhere in the ADV. But if it’s mentioning private equity or hedge funds or specific hedge fund strategies, anyone familiar with the space knows the answer is yes.

A generic system won’t make that connection. It’s just looking for the term. And if it doesn’t find it, it defaults to “no.”

That logic has to live in the interface. It’s not just about what the model can say—it’s about what the system understands before the prompt is even sent. That’s product logic. And in this domain, that kind of reasoning is the difference between a useful answer and a dead end.

Pure Math Editorial:

So that is where the product actually gets smart?

Sean:

Yeah. It’s not just about wiring a chatbot to a few tools and calling it done. You have to construct the chain—what the system knows, how it behaves, and what kinds of questions it even makes sense to answer.

And that’s tied directly to the subject matter. A financial chatbot should handle questions about risk, discretion, fees—whether the language in the document is clear or not. It should know when a question is out of scope, and how to respond if the answer is implied but not stated.

That’s the data processing, or the data science. The frontend is how you work with it. What matters is that someone who knows the domain has gone through the data so that the model has all the information and focuses on the right things.

If that logic isn’t built into the frontend, the system doesn’t work.

Tom:

That domain knowledge—and the expectations of the users—need to go into the design of the user experience. If the features and the AI’s responses aren’t both tailored for domain professionals, it’s just going to waste their time.

Pure Math Editorial:

And that’s part of what makes this project different, right? You’re not guessing at what might go wrong—you’ve built this kind of system before.

Sean:

Exactly. At a previous company we worked with, we built a similar system—just with a much messier and much larger dataset. It was all product listings: Amazon, Walmart, third-party sellers, plus user reviews, scraped pages, structured stuff like price and category, unstructured stuff like reviews and seller blurbs. Billions of rows.

We had to solve the same problems: thread logic, context management, streaming, versioning, memory constraints—and we had to make it feel fast, even though it was doing a lot of things in parallel behind the scenes.

We’re not experimenting here. We’re using an approach that already worked, but built specifically for this industry.

Tom:

Yeah, it is a very different industry, a very different dataset, and a very different product—but the fundamentals of doing this kind of work and building this kind of app are the same, so having that experience already affords us the time to really focus on the subtle nuances specific to this industry.

The Takeaways

If you’ve read this far, you’re likely someone we want to talk to.

We’re just getting started and wanted to share some of the challenges of turning AI into real products. There are other financial services-related databases in market, but in most cases, AI is being bolted on after the fact—not built into the core of the system.

Our platform is being built by people who’ve not only shipped production-scale, AI-first platforms before, but who’ve also used those other databases at some of the leading firms in the alternative investment industry.

This isn’t a public launch. We’re building an initial team of beta testers to kick the tires, recommend new features, and get us ready for prime time.

If this sounds like something you’d want to participate in, please Contact Us.

Pure Math Editorial is an all-purpose virtual writer we created to document and showcase the various ways we are leveraging generative AI within our organization and with our clients. Designed specifically for case studies, thought leadership articles, white papers, blog content, industry reports, and investor communications, it is prompted to ensure clear, compelling, and structured writing that highlights the impact of AI across different projects and industries.