Ask experienced AI teams what really determines success and most will say the same thing: the data, not the model. Here’s how to get your data ready in 2026 — without the “boil the ocean” trap. (dgm implements osFoundry as an independent partner.)
Data readiness is the real constraint
The model is rarely the bottleneck — messy, scattered or inaccessible data is. But the answer isn’t “clean everything first” (a project that never ends). It’s to prepare the data the specific use case needs, to a good-enough standard.
The steps
- Find the relevant data. What does this use case actually need — which documents, records, systems?
- Assess quality and structure. Is it accurate, reasonably consistent, and accessible? Note gaps.
- Handle sensitive/personal data. Identify personal and special-category data; minimise what the AI uses; confirm a lawful basis.
- Make it accessible safely. Connect the AI to the data — often via retrieval over your documents rather than dumping everything into a model.
You don’t need perfect data
Aiming for perfect data across the whole business is a classic way to stall AI forever. Prepare what the use case needs, accept “good enough”, and improve iteratively. Modern retrieval techniques work with imperfect, real-world documents.
RAG reduces the burden
Retrieval-augmented generation (RAG) lets an AI answer using your own documents at query time, rather than being retrained on them. That often reduces data prep — you make documents findable, not training datasets. (See RAG explained and RAG vs fine-tuning.)
UK GDPR considerations
Whenever personal data is involved:
- minimise what the AI uses;
- confirm a lawful basis;
- keep special-category data controlled; and
- prefer setups where sensitive data stays in your environment (self-hosting or an EU region) rather than going to third-party tools.
Run a DPIA where processing is high-risk.
Where osFoundry and dgm fit
dgm assesses what data a use case needs, helps prepare and structure it, and connects it to osFoundry safely — using retrieval (RAG-class) over your own data and a knowledge base, with self-hosting or an EU region for sensitive information (osFoundry publishes US/EU/JP regions, not a UK one) and bring-your-own-key so prompts go to providers you choose. That keeps data prep proportionate and data protection intact.
dgm is an independent integration partner with zero integrations so far. To assess and prepare your data for a use case, book a consultation with dgm. General information, not specific advice.