Insights on building enduring companies

What Karpathy’s Second Brain Looks Like Inside a Real Business

AI knowledge graph connecting business data sources

Every business, in some way or another, is utilising AI in their day-to-day operations — whether they realise it or not. When Andrej Karpathy, one of the leaders in AI thinking, posted on X about his LLM-maintained personal wiki, it blew up. Thousands of reposts, dozens of personal wiki projects spinning up overnight. Over the Easter break we implemented this type of thinking across the TCD business, and I believe we’ve built something very powerful for the AI-enhanced age.

What Karpathy Built

Credit where it’s due. Karpathy’s system is elegant in its simplicity. Drop source documents into a raw folder. An LLM “compiles” them into interlinked markdown files — summaries, backlinks, cross-references. Query the wiki and the answers file back in. Knowledge compounds over time. The LLM runs health checks for inconsistencies and surfaces new connections.

No fancy RAG pipeline. No vector databases. At moderate scale — around 400,000 words — you don’t need them. The key insight, and the one that hit home for us: synthesis creates intelligence, not just retrieval.

That distinction matters.

What We Were Already Building

Leading into the discovery of the Karpathy wiki structure, we were already building a knowledge base — one based on trawling through our project and tender folders, creating markdown files and project summaries that were more AI-friendly for operations at scale than gigabytes of PDF documents. We had all this data and we were looking to get it in a form where we could work with AI more efficiently in reviewing and creating work based on previous project deliveries.

So we had a knowledge system already being developed. But the wiki idea took that work and made it 100x more powerful.

What Business Scale Actually Requires

Karpathy’s wiki handles one person’s knowledge. Ours handles a 40-year-old mining services business. The scale is daunting. We have over 1,800 projects and over 500,000 files within those projects. Each project now has its own wiki page. Clients, key people, suppliers — all have one too. We’re now bringing our IMS system into the wiki, all backlinked to the relevant documents within our system and backlinked within the wiki, creating a valuable knowledge graph.

Everything runs locally — no data leaves the building, no API cost per inference. For a business handling sensitive commercial contracts, that matters.

That required something Karpathy’s personal setup doesn’t address: layers. We built four.

Layer 1 — Raw files. Source documents extracted and classified. PDFs, spreadsheets, contracts, safety reports — pulled apart and tagged.

Layer 2 — Domain summaries per project. Each project gets structured markdown files: CONTRACT.md, SAFETY.md, ESTIMATE.md, and so on. The LLM reads dozens of source documents and distils them into domain-specific summaries.

Layer 3 — Per-project wiki pages. One comprehensive page per project that pulls from all domain summaries. A single place that tells you everything about that job.

Layer 4 — Cross-project intelligence. This is where it gets powerful. Client pages that aggregate every project we’ve done for that client. Supplier pages built from performance data across jobs. Commercial intelligence that spans years and hundreds of contracts.

Five Things This Actually Does

Theory is cheap. Here’s a small snapshot of what we can do in practice.

1. A New Estimator Can Price Like a 20-Year Veteran

When we tender a new FMG pipeline project, the wiki already knows every FMG pipeline job we’ve done. Contract values, actual BOQ rates, which variations got approved and which didn’t, what equipment worked on similar ground conditions, which subcontractors performed well. That institutional knowledge used to walk out the door when a senior estimator retired. Now it compounds.

2. Safety Intelligence That Actually Prevents Incidents

We have thousands of hazard reports and tracked incidents in the wiki, cross-referenced to the specific projects and sites where they occurred. The system can tell us that manual handling injuries spike in the first hour of shift and on day one of swing — which means our pre-start briefings need to hit harder on those days. It can tell us which SWMS are referenced by projects with zero incidents versus projects with repeat incidents. That’s not a report someone runs quarterly — it’s knowledge that’s always there, ready to inform every safety plan we write.

3. A Project Manager Prepares for a Client Meeting in Minutes, Not Days

Before a monthly client meeting, a PM used to spend half a day pulling together manhours, incident stats, inspection completion rates, and progress data from three different systems. Now the project site page has monthly tables with all of it — workforce numbers, safety performance, leading indicators, hazard closure rates — synthesised with a narrative summary. One place, always current.

4. Document Compliance Becomes Automatic, Not Annual

We have 1,110 controlled documents in our IMS, and today 22 of them are overdue for review. The wiki flags that instantly. But the real power is the cross-reference — we can see that TCD-HSE-MP-xxx is referenced by 12 active projects. If that document gets revised, we know exactly which project safety plans need updating. Previously that connection existed only in a document controller’s head. Now it’s a backlink in the wiki.

5. Supplier Selection Backed by Actual Performance Data

We have 784 supplier pages built from project documents. Now enriched with scored performance reviews: time management, quality, safety, environmental performance, and whether PMs would recommend them again. When shortlisting subcontractors for a new tender, the wiki surfaces every supplier who’s done similar work, ranked by actual performance, with indigenous ownership status for client reporting. That used to be a phone call to three different PMs hoping they remembered.

What We Got Wrong

No point pretending this was clean.

Entity extraction isn’t perfect. Some of our “people” pages are actually equipment names. The system needs human review at the edges, and we’re still cleaning it up.

Supplier deduplication is a real problem. The same subcontractor appears under three different ABN variations across ten years of projects. Solving that properly is ongoing work.

What did work — the design decision that mattered most: folder structure before AI. We classify documents by folder prefix first, and only use the LLM as a fallback. That eliminated 95% of AI classification calls. If your file system is a mess, fix that before you throw a language model at it.

And extraction rate doesn’t need to be perfect. We prioritised high-value documents over completeness. Getting 80% of the important stuff is better than getting 100% of everything.

The Wiki Is the Moat

We now have an infrastructure that opens up so much more with human and AI collaboration within our business. We’ve all seen how powerful AI can be with a prompt and a few PDFs of input — the outputs can be amazing in a short period of time. Now consider what’s possible at this scale.

The model will be replaced by something better. The API landscape will shift. None of that matters if your knowledge is structured, interlinked, and compounding. The wiki is the moat, not the model.

If you’re running a business with decades of project history locked in folders nobody opens — that’s not an archive. That’s untapped intelligence. The tools to unlock it exist today, and they run on a single GPU sitting under a desk.

Start with what you have. Structure it. Let it compound.

Leave a comment