5 mistakes we made building an MCP server (so you don't have to)
We connected Notion, Linear, GitHub, Slack, and emails to Claude via MCP. It was supposed to save hours. Instead, it taught us how badly we understood what LLMs actually need.
I got tired of being the guy who knows where things are.
Every week, Robin, my co-founder at Moqa Studio pings me: "Where's the API spec?". It’s in Notion. “Did the client reply about the deadline?” Check Gmail. “What’s the status on that bug?” Linear. “Did you mention something about the auth flow?” Probably Slack, maybe two weeks ago.
I was the human search engine across five tools. And I hated it.
So I did what any developer would do, I over-engineered a solution. I built a Python MCP server that connects Claude directly to our entire knowledge base: Notion, Linear, GitHub, Slack, and Gmail. One interface. Ask a question, get an answer from wherever the information lives.
It took a weekend to get it running. It took three more weeks to make it actually work. Because the prototype was terrible, and I didn’t realize it until we started using it.
Here’s every mistake I made.
1: We wrapped every API endpoint as a tool
Our first instinct, expose every useful endpoint as its own MCP tool. Small, composable, RESTful. We’d been building APIs for years, this felt natural.
# ❌ Our first attempt — 23 tools
@server.tool()
async def notion_search_pages(query: str) -> str:
"""Search Notion pages."""
...
@server.tool()
async def notion_get_page(page_id: str) -> str:
"""Get a Notion page by ID."""
...
@server.tool()
async def linear_list_issues(team_id: str) -> str:
"""List Linear issues for a team."""
...
@server.tool()
async def slack_search_messages(query: str) -> str:
"""Search Slack messages."""
...
# ... 19 more tools like this
We had 23 tools. It was a disaster.
When a user asked “What’s the latest on this project?”, Claude had to read all 23 tool descriptions (~10,000 tokens just for the catalog), pick which to call, orchestrate multiple round-trips, and keep all intermediate results in memory.
What actually happened? Claude picked 2-3 tools, ignored the rest, and hallucinated the gaps. Every conversation was inconsistent.
The Fix
We collapsed 23 tools into 7 outcome-oriented ones.
# ✅ After: one tool replaces five API calls
@server.tool()
async def get_project_status(project_name: str) -> str:
"""Get a complete status overview of a project.
Use when the user asks about progress, status, or updates.
Returns: open tickets, recent commits, latest docs,
and recent discussions — all in one call.
"""
return {
"linear": await _get_project_tickets(project_name),
"github": await _get_recent_commits(project_name),
"notion": await _get_project_docs(project_name),
"slack": await _get_recent_discussions(project_name),
}
The orchestration happens in our code, not in the LLM’s context window. One call instead of five.
Rule: if the agent needs to chain 3+ tools to answer a common question, you need a higher-level tool.
2: Our Tool descriptions were written for engineers
We wrote descriptions the way we’d write code comments:
@server.tool()
async def search_knowledge_base(query: str) -> str:
"""Search the knowledge base."""
...
Claude kept ignoring this tool. A user asked “Find the API documentation for the payment module” and Claude answered from memory instead of searching. Confidently wrong.
The Fix
@server.tool()
async def search_knowledge_base(
query: str,
sources: list[str] = ["notion", "linear", "github", "slack", "gmail"],
limit: int = 10
) -> str:
"""Search across the company knowledge base for any information
about projects, clients, technical decisions, or internal discussions.
Use this as your FIRST tool when the user asks about any topic.
Searches Notion docs, Linear tickets, GitHub repos,
Slack conversations, and email threads simultaneously.
Returns a ranked list with: source, title, snippet, date,
relevance score, and doc_id for fetching full content.
Examples of when to use this:
- "What did we decide about the authentication approach?"
- "Find everything related to the latest project"
- "Any emails from the client about the deadline?"
"""
...
Three things: when to use it (”FIRST tool when...”), domain-specific terms (projects, clients, Notion, Linear), what comes back (exact return shape).
Tool selection accuracy went from ~60% to ~95%.
Treat tool descriptions like product copy. You’re selling the tool to an LLM.
3: Raw API responses murdered the context window
Our tools returned whatever the API gave us. A Notion page? Full JSON with blocks, properties, metadata, permissions, audit history. A single page was returning 8,000+ tokens.
# ❌ Returning raw API response
@server.tool()
async def get_document(source: str, doc_id: str) -> str:
if source == "notion":
page = await notion.pages.retrieve(doc_id)
blocks = await notion.blocks.children.list(doc_id)
return json.dumps({"page": page, "blocks": blocks}) # 8,000+ tokens 💀
Three pages = 24,000 tokens of raw JSON. Claude would read the first document and hallucinate the rest. Research confirms this: when the context window saturates, the model doesn’t gracefully degrade, it selectively forgets earlier tokens.
The Fix
Extract only what Claude needs:
# ✅ Curated, token-efficient responses
@server.tool()
async def get_document(source: str, doc_id: str) -> str:
if source == "notion":
page = await notion.pages.retrieve(doc_id)
blocks = await notion.blocks.children.list(doc_id)
return json.dumps({
"title": _extract_title(page),
"last_edited": page["last_edited_time"],
"content": _blocks_to_text(blocks), # Plain text, no JSON nesting
"tags": _extract_tags(page),
}) # ~800 tokens ✅
Same for Linear — full API response was ~3,000 tokens. After curation: ~300 tokens. 90% reduction. Claude could now handle 10+ documents in a single conversation.
Never return raw API payloads. Curate the output like you’re designing a dashboard.
4: The agent made random tool calls without workflow instructions
Even with good tools and descriptions, Claude’s behavior was a coin flip. Sometimes it searched first, then fetched details. Other times it skipped search entirely and guessed a doc_id. No predictable pattern.
We assumed Claude would figure out the right workflow. It didn’t. LLMs don’t have a mental model of your system.
The Fix
Explicit workflow instructions, embedded directly in descriptions:
@server.tool()
async def get_document(source: str, doc_id: str) -> str:
"""Fetch the full content of a specific document, ticket, or thread.
Use AFTER search_knowledge_base when you need complete content
of a specific result. The doc_id comes from search results.
Do NOT call this tool without first searching — you need a valid doc_id.
"""
That last line, "Do NOT call this tool without first searching", reduced hallucinated doc_id calls by ~80%.
We also added a system-level workflow:
SYSTEM_INSTRUCTIONS = """
WORKFLOW for answering questions about the company:
1. ALWAYS start with search_knowledge_base
2. Use get_document to fetch full content of top 2-3 results
3. For project overviews, use get_project_status instead
4. NEVER guess or use training data for company-specific questions
"""
5: We almost shipped without auth on a multi-Client server
This one still makes me uncomfortable.
Our first deployment ran on a shared server. One MCP instance, connected to all our tools, accessible by anyone on the team. Anyone asking about a project could theoretically pull up confidential client emails or private Slack channels from any workspace.
We didn’t think about it because during development, everything was local. stdio transport, our own laptops, our own credentials. Worked perfectly.
The Fix
Three changes:
Per-user credentials. Each user connects with their own OAuth tokens. No shared service accounts.
Scoped access. Tools only return data from workspaces the user already has access to.
Read-only by default. No write operations. No creating tickets, no sending emails. If we add writes later, every one requires explicit user confirmation.
The moment you move from local to remote deployment, you need auth, scoping, and audit logging. Day one.
The Checklist
If I were starting over:
Map the 5-7 most common questions your team asks across tools, design tools around those
Start with 5 tools max. Add more only when you see a clear gap
Write descriptions first (400+ chars, domain-specific), implement handlers second
Return curated responses, never raw API payloads
Embed workflow instructions in descriptions
Per-user auth from day one, read-only by default
Test by asking real questions, not by checking if API calls succeed
Our 7 tools now cover ~90% of the questions the team was asking. The remaining 10% can wait.
Final Thought
Building an MCP server isn’t an infrastructure project. It’s a product design project for a non-human user.
The protocol works fine. Your server probably doesn’t. Ours didn’t either, until we stopped thinking like API engineers and started thinking like designers building for a user that reads every word you write, but doesn’t understand any of it the way you do.
We've since deployed variations of this setup for client projects at Moqa Studio, same patterns, different tool stacks. The checklist holds up every time.


