In a previous post, I argued that centralised agent skill definitions are the key to scaling AI across an engineering org. In the follow-up, I walked through how to build a context layer that connects your knowledge base to AI tooling.
This post is the next step: what does an actual skill definition look like, why is it opinionated, and how do you enforce it across your entire engineering org?
Why a skill definition matters
Give an AI coding assistant a Go codebase with no context and ask it to add a new endpoint. You'll get something that compiles. It might even work. But it won't match your architecture. It won't follow your error handling patterns. It won't use your shared libraries. It won't write the tests the way your team expects.
The AI doesn't know:
- That your transport layer is deliberately dumb — just request mapping, no business logic
- That you use ULIDs, not UUIDs
- That all dependencies must be passed explicitly — no globals, no init()
- That new libraries need approval before being added to go.mod
- That integration tests must use testcontainers against a real database, not mocks
Without these rules, every AI-generated PR becomes a code review battle. The reviewer catches the violations, requests changes, the engineer fixes them — and the AI makes the same mistakes next time. You're paying for AI tooling but not getting the leverage.
A skill definition fixes this. It's a configuration file that tells the AI how your team works — the patterns, the conventions, the guardrails. Once it's in place, AI output matches your standards from the first generation.
What we built
We maintain a Go backend service template — a golden path that every backend API service in the organisation is built from. It's opinionated by design. Every API service follows the same layered architecture, the same testing patterns, the same deployment pipeline.
A note on scope: This skill definition covers API services — request/response workloads served over REST, gRPC, or GraphQL. Background workers, event consumers, cron jobs, and data pipelines have different concerns (concurrency patterns, retry semantics, idempotency, backpressure) and deserve their own skill definitions. Don't force a worker into an API template.
The template already encodes our standards in code. But code shows what — it doesn't explain why, and it doesn't tell the AI what's off-limits. That's what the skill definition does.
Here's what we considered when writing it.
The architecture section: encoding the "why"
The first thing the skill definition establishes is the architecture:
transport (REST/gRPC/GraphQL) → usecase → storage
But it doesn't just state the layers — it explains the rules:
- Transport layer is dumb. It maps requests to DTOs, calls the usecase, maps the response back. No business logic here. Ever.
- Business logic lives in
internal/usecase/. This is transport-agnostic. It uses plain Go structs andcontext.Context. - Storage is behind interfaces. Never use GORM directly in handlers or usecases.
- Dependencies are explicit. All handler dependencies are passed as function arguments or struct fields. No globals. No init().
Why so explicit? Because without this, an AI will happily put business logic in a handler, call GORM directly from a resolver, or create a package-level database variable. It doesn't know these are violations unless you tell it.
Adding a new entity: the golden workflow
One of the most common tasks is adding a new domain entity. Without guidance, an AI might start with the endpoint and work backwards. That's wrong in our architecture. The skill definition specifies the exact sequence:
- Define the entity and storage interface in
internal/storage/database/ - Define the usecase (interfaces, DTOs, implementations) in
internal/usecase/<entity>/ - Wire the transport handlers in
internal/transport/{rest,grpc,graphql}/ - Register everything in
internal/bootstrap/bootstrap.go - Write migrations via
make scaffold name=<entity>
Model → usecase → transport → bootstrap. Not the other way around. This is the kind of workflow knowledge that lives in senior engineers' heads. Encoding it means every engineer — and every AI tool — follows the same path.
The library gate: governing dependencies
This is one of the most important sections, and one that most AI configurations miss entirely:
Before reaching for an external library, check if the company shared libs already provide the functionality. This is a hard gate.
We've seen what happens without this rule. Teams independently adopt three different HTTP client libraries, two logging frameworks, and four ways to handle configuration. Each one is individually reasonable. Collectively, they're a maintenance nightmare.
The skill definition makes the approved dependency set explicit — Chi for HTTP routing, GORM for database access, testify for assertions, testcontainers for integration tests. If a library isn't in the template's go.mod, it needs a discussion before it gets added.
This is the kind of organisational guardrail that AI tools will never infer from the code alone.
Testing: non-negotiable
The skill definition doesn't suggest tests. It mandates them:
Tests are mandatory. No exceptions.
And it's specific about what "tested" means:
- Unit tests with gomock and testify, colocated with source files, run in parallel
- Integration tests with testcontainers against real Postgres — never mock the database in integration tests
- Coverage enforced by CI via SonarQube
The distinction between unit and integration tests matters. We got burned in the past when mocked database tests passed but production migrations failed. The skill definition encodes that lesson so no one has to learn it again.
The "never do this" list
Every opinionated system needs explicit anti-patterns. Ours has eight:
- Never overhaul the template — the structure is the standard
- Never modify
.github/— workflows are synced from the template - Never copy-paste without understanding the structure
- Never commit generated code
- Never bypass the shared library gate
- Never write a handler without tests
- Never use raw GORM outside the storage layer
- Never hardcode secrets, URLs, or environment-specific values
These aren't aspirational guidelines. They're hard rules. And they're the rules most likely to be violated by AI tools without explicit instruction — because an AI optimises for "working code," not "code that belongs in your system."
PR process: contract first, stack small
The skill definition also encodes how work gets reviewed:
Contract first. Before writing code, agree on the API contract — protobuf definition, GraphQL schema, or OpenAPI spec. The contract is the handshake between teams. Code comes after.
Stack PRs for large features. One concern per PR. If a PR touches usecase, storage, transport, and config simultaneously, it's too big. Break it down: model → usecase → transport → wiring. Each one is a reviewable, mergeable unit.
This is the kind of process knowledge that usually lives in onboarding docs that nobody reads. Putting it in the skill definition means the AI actively follows it — suggesting stacked PRs when the scope gets large, asking about contracts before generating code.
Keeping it centralised: engineers can't change it
The skill definition is only useful if it's consistent across every service. If individual teams can modify their copy, you end up with drift — and drift is worse than no standard at all.
We solve this the same way we handle CI workflows: template sync.
Our service template repository contains the CLAUDE.md file alongside the .github/ workflows. When changes are pushed to the template, they automatically sync to all downstream service repositories. Engineers can't modify the file in their repos — it gets overwritten on the next sync.
engineering-standards/ # Template repo (write access: platform team only)
├── .github/workflows/ # CI/CD pipelines
├── CLAUDE.md # AI skill definition (backend-go.md)
└── ... # Template code
payment-service/ # Downstream service repo
├── .github/workflows/ # ← synced from template
├── CLAUDE.md # ← synced from template
└── src/ # Team's business logic
The platform team owns the skill definition. They iterate on it based on code review patterns — if reviewers keep catching the same AI-generated mistakes, the fix goes into the definition, not into individual PRs.
This creates a feedback loop:
AI generates code → reviewer catches a pattern violation →
platform team updates CLAUDE.md → AI stops making that mistake →
across every service, immediately
That's the difference between fixing a problem once and fixing it everywhere.
The full skill definition
Here's the complete CLAUDE.md we use for Go backend services. It's opinionated, specific, and it works.
Architecture
This codebase follows a clean layered architecture:
transport (REST/gRPC/GraphQL) → usecase → storage
- Transport layer is dumb. It maps requests to DTOs, calls the usecase, maps the response back. No business logic here. Ever.
- Business logic lives in
internal/usecase/. This is transport-agnostic. It uses plain Go structs andcontext.Context. It never imports transport-specific packages. - Storage is behind interfaces. Never use GORM directly in handlers or usecases. Always go through the repository interface in
internal/storage/database/. - Dependencies are explicit. All handler dependencies are passed as function arguments (REST) or struct fields (gRPC/GraphQL). Looking at a handler must tell you exactly what it depends on. No globals. No init().
Adding a new entity/domain
Think in usecases first, not endpoints.
- Define the entity in
internal/storage/database/— model struct with the base Model embed (ULID generation), plus the storage interface. - Define the usecase in
internal/usecase/<entity>/— interfaces (Creator, Fetcher, Manager), DTOs, and implementations. - Wire the transport in
internal/transport/{rest,grpc,graphql}/— handlers that call the usecase. - Register in bootstrap — add the new storage and usecase manager to
internal/bootstrap/bootstrap.go. - Write migrations — use
make scaffold name=<entity>to generate migration files.
The flow is always: model → usecase → transport → bootstrap. Not the other way around.
Code conventions
Structure: Follow the existing package structure exactly. Do not create new top-level packages. Use internal/ for all application code. Only pkg/ for truly reusable utilities. Generated code (mocks, swagger, GraphQL) is never committed.
Naming: Handlers use <Entity><Action> (e.g., CreateTodo). Interfaces use role-based names (Creator, Fetcher, Manager) — not ITodoService. Errors are descriptive (ErrNotFound, ErrInvalidArgument).
Error handling: Errors are transport-specific. Usecases return plain Go errors. Transport layers map them to the appropriate format (HTTP status codes, gRPC AppError, GraphQL gqlerr). Never swallow errors.
Configuration: All config via environment variables with struct tags and sensible defaults. Never hardcode values. Config structs live in internal/config/.
IDs: Use ULIDs, not UUIDs.
Dependencies and libraries
Use shared company libraries first. Before reaching for an external library, check if the company shared libs already provide the functionality. This is a hard gate.
Do not add new libraries without approval. The template defines the approved dependency set.
Testing
Tests are mandatory. No exceptions.
Unit tests: Colocated with source files. Use gomock for mocking, testify for assertions. Run with t.Parallel() where possible. Test behaviour, not implementation.
Integration tests: Use build tag //go:build integration. Use testcontainers for real Postgres and Redis — never mock the database in integration tests. Containers auto-cleanup via tb.Cleanup().
Observability
Observability is wired into the template from the start. Structured logging with context and correlation IDs. Prometheus metrics pre-configured. Health checks at /internal/health and /internal/metrics. Panic recovery built into all transport layers.
Do not add custom observability infrastructure. Use what the template provides.
Performance
Performance is baked in, not an afterthought. But also: KISS. No N+1 queries. Use DataLoaders for GraphQL. Profile before optimising. Keep handlers thin.
What you must never do
- Never overhaul the template
- Never modify
.github/ - Never copy-paste without understanding the structure
- Never commit generated code
- Never bypass the shared library gate
- Never write a handler without tests
- Never use raw GORM outside the storage layer
- Never hardcode secrets, URLs, or environment-specific values
PR process
Contract first. Agree on the API contract before writing code.
Keep PRs small. One concern per PR. Stack PRs for large features: model → usecase → transport → wiring.
CI is the gatekeeper. All tests, linting, secret scanning, and coverage must pass. If CI fails, fix it.
What changed after we shipped it
The most immediate impact was on code review. Reviewers stopped catching the same structural violations. AI-generated PRs started following the architecture — usecases in the right place, dependencies explicit, tests included.
The subtler impact was on junior engineers. A junior with this skill definition gets AI output shaped by 15 years of backend engineering experience. They don't need to know why we use ULIDs or why the transport layer is dumb — the AI already knows, and the code it generates reflects that.
That's the real leverage of centralised skill definitions: you encode your best thinking once, and every engineer benefits from it every day.
This is part of a series on AI adoption for engineering teams. Start with the strategy, then read about building the context layer. If you're building this for your org and want help, get in touch.
The views expressed in this article are my own and do not reflect the opinions or positions of my employer.