A compliant AI layer that sits on top of a customer's Office 365 / Google Workspace — we observe how a team works, surface friction, and deliver custom business-AI on top of it as a SOC 2-ready service.
Lead Manish
Sponsor Gopal
Phase Build · ~55%
Milestone Sell first app · Dec 2026
The idea in one line
Customers already live in Office 365 / Google Workspace. CheckLLM plugs in as a trusted layer, reads their working context (with permission), and ships custom AI workflows on top — packaged, observable, and compliant — rather than another app they have to adopt.
Architecture at a glance
End-to-end: a tenant's users sign in through their own workspace; requests hit the Next.js app behind Traefik; the AI layer answers using Claude grounded in that tenant's data; everything runs on our Dokploy/Docker VM with per-tenant Postgres (RLS) and encrypted GCS backups.
Workspace (the customer)
Office 365 + Google Workspace — where the customer's data and identity already live.
▲ connects via
Integration layer
Microsoft Graph API + Google Workspace APIs — read context (read-only first), respecting workspace permissions.
▲
CheckLLM app
Next.js (UI + API routes). Sign-in is the same workspace identity — that's the compliance hook.
▲
AI layer
Anthropic Claude (Opus 4.8 / Sonnet) + retrieval (pgvector) — answers grounded in the customer's own documents.
▲
Data layer
Postgres (our own) or Supabase + pgvector — app data, embeddings, audit log. See decision below.
▲ runs on
Infrastructure
Dokploy + Docker + Traefik + Let's Encrypt on our VM — same pipeline we already run.
The stack & why
Layer
Tool
Why this
App framework
Next.js + TypeScript
One framework for the UI and the backend API routes — fewer moving parts.
UI
Tailwind CSS + shadcn/ui
Fast, consistent, accessible components — looks polished without a design lift.
State / data
Zustand + React Query
Zustand for client state, React Query for server data & caching — our standard pattern.
Auth
NextAuth + Entra ID + Google
Users sign in with their own Microsoft / Google Workspace identity — no new account, and it's the basis of the compliance story.
AI
Anthropic Claude
The reasoning + assisted layer. Opus 4.8 for hard tasks, Sonnet for fast/cheap.
Retrieval (RAG)
pgvector
Embed and search the customer's docs so answers are grounded in their data, not generic.
Decision below — both are Postgres, so the choice stays reversible.
ORM
Prisma
Type-safe DB access; we already use it, and it keeps us portable across the two DB options.
Runtime / PM
Bun
Our standard across Stringify repos.
Deploy
Dokploy · Docker · Traefik
Existing VM pipeline — auto SSL, no new infra to learn.
Compliance
Audit log · RBAC · encryption
Built in from day one so SOC 2 is groundwork, not a retrofit.
Database — our own vs Supabase
Both options are Postgres underneath, so the data model and our Prisma code don't change. The real choice is how much we build & run ourselves vs how much we rent ready-made — i.e. one-time effort now against ongoing happiness later.
A · Our own deployed Postgres
Dokploy-managed on our VM — the way we run every other DB.
Fastest at runtime — sits next to the app, no network hop
Full control & data residency — cleanest SOC 2 story
We already operate it: Dokploy, daily GCS backups, pgvector
Near-zero marginal cost — runs on a VM we already pay for
No lock-in — plain Postgres, fully portable
We build auth, storage, APIs, realtime ourselves
We own the ops forever: patching, scaling, monitoring, on-call
Best when control, speed & cost come first.
B · Supabase
Managed Postgres with batteries included.
Auth + Storage + Realtime + auto REST/GraphQL + Edge Functions + pgvector, out of the box
Fastest to a first version — far less to build
Managed infra: they patch, scale, back up, keep it up
Platform is itself SOC 2 Type II
Recurring cost that grows with usage
External subprocessor — customer data leaves our infra (residency review)
Some lock-in — auth/storage/edge-functions aren't trivially portable
Best when speed to ship & low ops come first.
Side by side
What matters
Our own Postgres
Supabase
Setup (one-time)
Higher — provision, wire auth/storage/backups
Lower — usable in minutes
What you get
A database; we build the rest
DB + auth + storage + realtime + APIs + functions
Runtime speed
Lowest latency — co-located with the app
Network hop + pooler unless co-located
Ongoing ops
Ours — patch, scale, monitor, on-call
Theirs — managed
Cost
~Free — on a VM we already pay for
Free tier → Pro $25/mo/project → Team $599/mo + usage
Data control / SOC 2
Full control, stays on our infra
External subprocessor; vendor review needed
Lock-in
None — portable Postgres
Some — managed features are sticky
Scaling
Manual — bigger VM, read replicas
Plan-based, less effort
Dev velocity
Slower first features
Faster MVP
One-time effort vs long-term happiness
A · Our own Postgres
One-time effort — higher
Stand up auth (NextAuth), file storage (GCS), connection pooling, and our own dashboards/scripts. A few extra weeks of plumbing up front.
Long-term — happier if we have ops bandwidth
No bills that grow, no vendor in the critical path, fastest runtime, clean compliance. The cost is that every patch, scale and 2am page is ours.
B · Supabase
One-time effort — lower
Connect and go — auth, storage, realtime and APIs already exist. We ship the first use case noticeably sooner.
Long-term — happier if we'd rather not run infra
Fewer 2am pages, automatic scaling. The cost is a recurring bill that grows, an external party holding customer data, and some lock-in to unwind later.
Recommendation: Go with our own Postgres as the strategic base. We already run the muscle (Dokploy, GCS backups, pgvector), it's the fastest at runtime, costs almost nothing on top of the VM, and gives the cleanest data-residency story — which matters because SOC 2 and customer workspace data are core to CheckLLM. We trade a few weeks of one-time setup for no recurring bill, no vendor in the data path, and zero lock-in.
The honest hybrid: if speed-to-first-demo is the priority, we could prototype on Supabase and migrate before the SOC 2 audit — but only cleanly if we use it purely as Postgres (our own NextAuth + GCS), since its auth/storage are the parts that don't port. If we lean on those, the migration stops being free. So: own Postgres unless the meeting decides demo speed beats everything.
Read O365 / Workspace context via Graph + Google APIs — read-only first.
Gate: real customer context in
03
AI layer
Claude + RAG over workspace docs; ship the first observational use case.
Gate: one use case end-to-end
04
Delivery-ready
Package one app, polish, sign-off (the 2-week push).
Gate: Jul 2026 sign-off
05
SOC 2 / QA
Audit logging, RBAC, controls review.
Gate: Aug 2026 controls pass
Building for many clients
The bet: build the hard plumbing once, spin up each new client mostly from config. First client is slow because we're building the platform; every client after is fast because we're just configuring it. The model is one platform, many tenants — never a forked codebase per client.
Every client plugs in their own workspace; one shared engine serves all of them; each tenant's data stays walled off by Row-Level Security.
One codebase, per-client config
Auth, the O365 / Google connectors, the AI + RAG layer, and the SOC 2 controls are written once
A new client = a tenant record + their workspace connection + which use-cases are switched on
Per-tenant feature flags & small plugin modules tailor behaviour — no fork, no N codebases to patch
SOC 2 covers every tenant at once, because it's one platform
Data isolation — the real decision
Fastest to grow
Shared DB · row-level
Every row tagged tenant_id, enforced by Postgres Row-Level Security so a query can never leak across clients.
Lowest ops · onboard a client in minutes · relies on strict RLS
Middle ground
Schema-per-tenant
Each client gets their own Postgres schema inside one database — stronger separation, shared engine.
Moderate ops · clearer boundaries
Strongest isolation
DB / deploy-per-tenant
Full physical separation per client — best for compliance & data residency.
Highest ops · for big or sensitive clients
Recommendation: Start shared-DB + RLS for speed, and let a big or especially sensitive client graduate to a dedicated DB. Same code path, just a different connection — so isolation becomes a per-client setting, not a rewrite.
Onboarding a new client — the fast loop
01
Connect workspace
Client OAuths their own O365 / Google — tokens scoped per tenant.
02
Ingest their data
Their docs flow into per-tenant RAG — never mixed with another client's.
03
Toggle use-cases
Switch on the workflows they need; flag any custom bits.
04
Ship
Live for that client — days, not months.
The tension to name: shared-DB multi-tenancy is the fastest way to grow, but it puts the strictness of our Row-Level Security between us and a cross-client data leak. Getting RLS right is a hard requirement — and it's exactly what our SOC 2 Confidentiality criterion is there to prove.
Backend model
Same backend codebase for every client — always. What we vary is the deployment and the data boundary, never the code. Different code per client is the agency death-spiral (N repos, N audits) — we don't do it.
Same code and the same auth layer for everyone. Most clients share one instance; a big or regulated client graduates to a dedicated one — a config + deploy change, not a rewrite.
Model
Code
Compute
Data
Use when
Pooled default
Shared
Shared instance
Shared DB + RLS
Most clients — fastest, cheapest
Siloed premium
Shared
Dedicated instance
Dedicated DB
Big / regulated / data-residency
Auth: one shared auth layer in both rows. Each client's users sign in through their own Office 365 / Google Workspace, but the auth system is one piece of code, configured per tenant. Isolate at the data and deployment layer — never fork at the code layer.
Handling per-client requirements
When one client needs an API or feature the others don't, it becomes a module or a flag on the shared core — never a fork. We go down this ladder, cheapest first.
Rising effort left to right — and falling frequency. The vast majority of "custom" asks resolve at the two cheapest rungs without touching the codebase.
Config flagThe difference is "turn X on" or a parameter. Client A's flag is on, everyone else's is off. Zero code divergence.
AI-layer configOften a "custom API" is really a new assistant task: a new tool + RAG connector + workflow. That's configuration of the shared engine — no backend code at all. The CheckLLM shortcut.
Plugin moduleA genuinely new endpoint or niche integration ships as an optional module in the same codebase, activated only for entitled tenants. Core untouched; same deploy; same audit.
Siloed deployToo heavy or proprietary to ship to everyone → their own instance running the same base code + their module. Rare, reserved for big/regulated clients.
The rule that keeps the platform healthy: build the one-off for the first client; generalize it on the second ask. A recurring need graduates from "Client A's module" to a toggle everyone can use — the platform gets richer over time instead of bloating with dead custom code.
Guardrails: entitlements gate visibility (a client's custom API simply isn't exposed to others) · a bespoke module must never degrade the shared path · the default answer to "can you build us X" is "yes, as a module," not "yes, in a branch." This is the open/closed principle — core closed for modification, open for extension.
What SOC 2 requires
SOC 2 is built on the AICPA Trust Services Criteria (2017, revised points of focus 2022). We pick which criteria are in scope — Security is always required — then prove our controls actually operated over a 6–12 month window. That's a Type II report.
Trust Services Criteria — what we'll claim
Security
Required
Confidentiality
In scope
Availability
In scope
Processing Integrity
Later
Privacy
Later
We start with Security + Confidentiality + Availability — Security is mandatory, and the other two matter because we hold customer workspace data and run a delivery service. Processing Integrity and Privacy can be added in a later audit cycle.
The controls we have to build (Common Criteria CC1–CC9)
Head start: we already run a few of these — an append-only audit trail and daily encrypted backups to GCS on our existing apps. CheckLLM inherits those patterns from day one, so SOC 2 is groundwork, not a retrofit.
The path: choose Type II → set the testing window (6–12 months) → readiness / gap assessment → run controls & collect evidence (automate with Vanta/Drata) → independent CPA firm performs the audit. Budget realistically ~$25k–$80k and 9–18 months end to end, so the SOC 2 work runs in parallel with the build, not after it.