If you’re running a site with Claude Code on Linux, the most useful document you can write is not a design spec or a feature list. It’s a Claude Code hosting runbook: a short, practical guide that explains what to do when the site is healthy, what to check when it isn’t, and who is responsible for each step.
A good runbook does not replace automation. It complements it. When something breaks at 11:47 p.m., you do not want to reinvent the system from memory. You want a clear sequence: where to look, what to verify, how to roll back, and when to stop guessing. That matters even more if your site is managed by an AI engineer in a sandboxed Linux environment, because the agent can move quickly but still needs guardrails.
This guide shows you how to build a Claude Code hosting runbook that actually gets used. The goal is not documentation for documentation’s sake. The goal is fewer bad deploys, faster incident response, and less time spent asking, “What changed?”
What a Claude Code hosting runbook should cover
Think of the runbook as the operating manual for your site. If someone new had to take over your server today, what would they need first?
For most AI-built Linux sites, the runbook should cover five areas:
- Service overview — what the site does, where it runs, and how it is deployed
- Routine checks — how to confirm the app, database, nginx, and background jobs are healthy
- Deploy steps — how a release moves from code to production
- Incident steps — what to do for 500s, broken assets, SSL issues, or a failed deploy
- Recovery steps — backups, rollback, restore, and escalation contacts
If you use a hosting platform like Vibesies, some of the infrastructure details are already standardized inside the container. That makes the runbook easier to write, because you can focus on your app instead of inventing server policy from scratch.
Long-tail keyword: Claude Code hosting runbook for Linux sites
The phrase Claude Code hosting runbook for Linux sites is a good way to frame the work because it’s specific enough to be useful and broad enough to apply to blogs, SaaS apps, marketing sites, and internal tools.
A lot of teams already have fragments of this information scattered across notes, Slack, and commit messages. The runbook pulls it into one place. It should be short enough that you will actually open it during an incident, but detailed enough that it stops you from making a dumb mistake under pressure.
Start with a one-page service summary
Before you write troubleshooting steps, write the basics. This section should fit on one screen.
Include:
- App name and primary domain
- Hosting location and environment name
- Primary stack — for example Flask, Django, Node, PostgreSQL, Redis
- How deployments happen — Claude Code, git push, CI, or a mix
- Critical dependencies — payment provider, email provider, object storage, third-party APIs
- Owner and escalation contact
Example:
- Site: docs.example.com
- App: Flask 3 + Gunicorn + Nginx
- Database: PostgreSQL
- Deploy method: Claude Code edits in sandbox, then reloads app service
- Backups: nightly snapshot, 7-day retention
This sounds basic, but it saves time when you are tired. If your future self has to ask where the app lives or whether backups are enabled, the runbook failed at page one.
Document the normal path before the failure path
Many runbooks jump straight to incidents. That is backward. You should first document what “normal” looks like, because most debugging starts with comparing reality to the expected state.
For a Claude Code hosted site, write down the standard checks for:
- Web process — is the app server running and responding?
- Reverse proxy — is nginx serving traffic cleanly?
- Database — can the app connect and run a simple query?
- Static files — are CSS, JS, and images loading?
- Background jobs — if you use queues, are workers healthy?
- Logs — do recent entries show errors, retries, or timeouts?
Keep these checks concrete. Avoid phrases like “verify everything is okay.” Instead, write commands or exact observations the agent should use. For example: “Confirm the homepage returns 200,” or “Check that the latest deploy timestamp matches the current commit.”
Build incident playbooks for the most common failures
A useful Claude Code hosting runbook for Linux sites is really a set of small playbooks. Each playbook should answer three questions:
- What does this failure look like?
- What is the fastest safe fix?
- What do we do if the first fix does not work?
1. 500 Internal Server Error
Write down the first three checks:
- Does the app process start correctly?
- Did the last deploy introduce a bad config or syntax error?
- Is the database reachable?
Then add the rollback rule. For example: “If the error began immediately after deploy and logs point to app startup failure, revert to the previous working release.”
2. Site loads, but styling is broken
This is usually a static asset or cache issue. Your playbook should include:
- Check whether assets were rebuilt
- Confirm static files are being served from the expected path
- Verify browser cache or CDN cache is not masking the fix
3. SSL or domain problems
Document how to confirm the certificate status, DNS record, and canonical hostname. If you run multiple environments, note which domain is primary and which ones should redirect.
4. Email stopped sending
For many small sites, this is business-critical. Capture the provider, the sender domain, SPF/DKIM/DMARC status, and any rate limits or suppression lists to check.
5. Backups or restores fail
Backups are only useful if you have tested a restore. Your runbook should state:
- Where backups are stored
- How often they run
- How to restore a single file vs. the full app
- How long a restore should take
Add a release checklist to every runbook
A release checklist is one of the most underrated parts of a hosting runbook. It prevents “works on my machine” releases from reaching production and gives Claude Code a predictable sequence to follow.
Here is a practical checklist you can adapt:
- Confirm the branch or commit to deploy
- Review recent changes for config, secrets, or dependency updates
- Run tests or smoke checks
- Verify environment variables are present
- Confirm database migrations are safe and reversible
- Take a backup or snapshot if the change is risky
- Deploy to production
- Check homepage, login, and one critical user flow
- Watch logs for the first 10–15 minutes
If your site is mission-critical, add a “go/no-go” line. Example: “Do not deploy if the checkout provider is experiencing issues,” or “Do not deploy within one hour of a product launch email.”
Write rollback instructions while you still remember them
Rollback is where many runbooks get vague. They say “revert if needed,” which is not enough when traffic is failing.
Instead, specify:
- What counts as a rollback trigger
- How to identify the last known good release
- Whether database changes can be rolled back safely
- How to restore static assets or config files
- How to confirm the rollback worked
If a deploy includes both code and schema changes, note whether the schema is backward-compatible. That’s the difference between a five-minute recovery and a messy restore.
Decide what Claude Code should do automatically
One of the advantages of working with Claude Code on a hosted Linux environment is that the agent can perform many routine tasks without hand-holding. But a runbook still needs boundaries.
A good rule is to split actions into three buckets:
- Safe to automate — health checks, log review, restart a failed service, run a backup verification
- Confirm first — deploys, migrations, DNS changes, cache purges
- Human only — billing changes, credential rotation, production data deletion, compliance-related edits
This keeps the agent useful without turning it loose on tasks that should still require a second set of eyes.
A simple runbook template you can copy
If you want a starting point, use this structure:
- 1. Service summary
- 2. Critical contacts
- 3. Normal health checks
- 4. Deploy procedure
- 5. Rollback procedure
- 6. Incident playbooks
- 7. Backup and restore steps
- 8. Escalation criteria
If you prefer keeping operational docs inside the project itself, store it in the repo as RUNBOOK.md or ops/runbook.md. If your team uses a hosted workspace or container platform, that file can live right alongside the app code so Claude Code can reference it while working.
Keep the runbook alive with post-incident updates
The fastest way for a runbook to become useless is to write it once and never touch it. Every incident should end with a short update to the document.
After a fix, ask:
- What did we learn?
- Which step was missing?
- Which check should happen earlier next time?
- Did any assumptions turn out to be wrong?
Even a small note helps. For example: “Add database credential check before app restart,” or “Document that image uploads fail when disk usage exceeds 90%.” Over time, this turns the runbook into a real operational asset instead of stale documentation.
Runbook review checklist
Before you call it done, review your Claude Code hosting runbook against this checklist:
- It fits on a few pages, not a wiki labyrinth
- It includes exact checks, not vague advice
- It covers deploys, rollbacks, backups, and incidents
- It names owners and escalation paths
- It reflects the current app, not last quarter’s architecture
- It is stored where you and Claude Code can find it quickly
If you use Vibesies or another AI hosting setup where each site has its own persistent Linux environment, this is the document that makes the whole arrangement easier to operate. The AI can help execute tasks, but the runbook defines the rules.
Final thought
The best Claude Code hosting runbook for Linux sites is boring in the right way. It removes guesswork. It shortens recovery time. It keeps deploys calm. And it makes your AI engineer more effective because it knows what “good” looks like before anything goes wrong.
If you write nothing else this week, write the runbook first. Your future self will thank you the first time production gets weird.