How to Build an MCP Server for AI Image Processing with PixelPanda

You know the ritual. Open Photoshop. Export. Upload to your product photography tool. Wait. Download. Re-upload somewhere else. Rename files. Repeat forty-seven times before lunch.

What if you could just say what you need?

“Remove the background from this product shot.” “Generate a lifestyle photo of my sunglasses with the beach avatar.” “Show me all my saved AI models.”

That’s not a fantasy workflow — it’s what happens when you wire up a PixelPanda MCP server to Claude Code (or any MCP-compatible client). Your AI assistant becomes a direct line to background removal, AI product photography, and virtual try-on — no browser tabs, no manual uploads, no context-switching.

In this guide, we’ll build one together from scratch in TypeScript, step by step.


What Is MCP (and Why Should You Care)?

The Model Context Protocol (MCP) is an open standard from Anthropic that defines how AI applications talk to external tools and data sources. Think of it as USB-C for AI — one universal plug instead of a hundred bespoke adapters.

MCP exists because LLMs have a frustrating limitation: they can describe how to remove a background from an image, but they can’t actually do it. MCP gives them hands.

Here’s how the pieces fit together:

  • An MCP client (Claude Code, Claude Desktop, Cursor) connects to an MCP server
  • The server advertises its capabilities — tools, resources, and prompts
  • The AI model decides when and how to call those capabilities based on your conversation

No hardcoded workflows. Just natural language driving real API calls.

Panda Tip: You don’t need to understand the full MCP spec to build a server. The SDK handles all the protocol negotiation. You just define your tools and the SDK does the rest.


What Can an MCP Server Expose?

An MCP server advertises three types of capabilities:

Capability What It Is Example
Tools Functions the AI can call Remove a background, generate a product photo
Resources Data the AI can browse Your avatar library, product catalog
Prompts Reusable workflow templates “Run a full product shoot for item X”

For our PixelPanda server, we’ll focus on tools — they’re the most impactful and the easiest to understand. Here’s what we’re building:

Tool What It Does Credits
list_avatars Browse your saved AI avatar library Free
remove_background Strip backgrounds to transparent PNG 1/image
generate_product_photo AI marketing photography with avatar + product 1/image

Once you see the pattern for these three, adding more tools (virtual try-on, scene generation, batch processing) is copy-paste simple.

Panda Tip: We’re building a local STDIO server — it runs on your machine as a subprocess. No cloud deployment, no network config. The MCP client spawns the server process and they talk over stdin/stdout. It’s the fastest path to a working integration.


Step 1: Project Setup

Let’s get the scaffolding in place. Create a new directory and install dependencies:

mkdir pixelpanda-mcp && cd pixelpanda-mcp
npm init -y
npm install @modelcontextprotocol/sdk zod
npm install -D typescript @types/node
npx tsc --init

Open tsconfig.json and set these four values (leave everything else as default):

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./dist"
  }
}

Now create a src/index.ts file. This single file will hold our entire server — MCP servers are intentionally compact.

What you should see: Your project directory should have node_modules/, package.json, tsconfig.json, and an empty src/index.ts.


Step 2: Imports and Configuration

Every request to PixelPanda’s API v2 needs a Bearer token. Let’s set up the foundation — imports, the base URL, and a check that the token actually exists:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

// PixelPanda API v2 base URL
const PIXELPANDA_BASE_URL = "https://pixelpanda.ai/api/v2";

// Pull your API token from the environment (never hardcode it)
const API_TOKEN = process.env.PIXELPANDA_API_TOKEN;

if (!API_TOKEN) {
  console.error("PIXELPANDA_API_TOKEN environment variable is required");
  process.exit(1);
}

Three imports, one constant, one guard clause. That’s the entire config layer.

Panda Tip: Your API token looks like pk_live_xxxxxxxxxxxx. Grab it from Dashboard Settings. You need a paid plan (the $5 Try It pack works) — API v2 requires at least Starter-tier access.


Step 3: The API Helper

We’ll make a lot of authenticated HTTP calls, so let’s write a small wrapper that handles the Bearer token and error checking in one place:

async function ppFetch(path: string, options: RequestInit = {}): Promise<Response> {
  const url = `${PIXELPANDA_BASE_URL}${path}`;

  const response = await fetch(url, {
    ...options,
    headers: {
      Authorization: `Bearer ${API_TOKEN}`,
      "Content-Type": "application/json",
      ...options.headers,
    },
  });

  // Surface API errors clearly so the AI can report them
  if (!response.ok) {
    const body = await response.text();
    throw new Error(`PixelPanda API error ${response.status}: ${body}`);
  }

  return response;
}

Every tool we build will use ppFetch("/some-endpoint") instead of raw fetch(). The auth header, error handling, and base URL are all baked in.


Step 4: The Job Poller

Here’s something important about PixelPanda’s generation endpoints: they’re asynchronous. When you request a product photo, the API immediately returns a job_id, and you poll /jobs/{id} until it’s done.

Let’s write a reusable poller so our tool handlers stay clean:

async function pollJobStatus(
  jobId: string,
  maxAttempts = 60,       // Give up after 60 checks
  intervalMs = 3000       // Check every 3 seconds
): Promise<Record<string, unknown>> {
  for (let i = 0; i < maxAttempts; i++) {
    const res = await ppFetch(`/jobs/${jobId}`);
    const job = (await res.json()) as Record<string, unknown>;

    // Terminal states — either we're done or something broke
    if (job.status === "completed" || job.status === "failed") {
      return job;
    }

    await new Promise((resolve) => setTimeout(resolve, intervalMs));
  }

  throw new Error(`Job ${jobId} timed out after ${maxAttempts} attempts`);
}

With 60 attempts at 3-second intervals, this gives jobs up to 3 minutes to complete. Most PixelPanda generations finish in 15-30 seconds.

Panda Tip: Why console.error and not console.log? STDIO transport uses stdout for MCP protocol messages. If you log to stdout, you’ll corrupt the protocol stream. Always use stderr for logging in MCP servers.


Step 5: Create the MCP Server Instance

This is the simplest part. One call to the SDK:

const server = new McpServer({
  name: "pixelpanda",
  version: "1.0.0",
  description:
    "AI image processing — background removal, product photography, " +
    "virtual try-on, and scene generation via PixelPanda",
});

The SDK handles all protocol negotiation and capability advertisement behind the scenes. From here on, we just register tools.


Step 6: Build the list_avatars Tool

Let’s start simple. This tool is read-only — it fetches your saved AI avatars and returns their names and UUIDs. No credits consumed, no async polling. A perfect first tool.

server.tool(
  "list_avatars",                                 // Tool name
  "List all saved AI avatars available for "      // Description (shown to the AI model)
    + "product photography and try-on. Returns "
    + "UUIDs needed for generate_product_photo.",
  {},                                             // No input parameters
  async () => {
    try {
      const response = await ppFetch("/avatars");
      const avatars = await response.json() as Array<{
        uuid: string; name: string; style?: string;
      }>;

That’s the first half — make the API call and parse the response. Now let’s format the results for the AI to read:

      // Format each avatar as a readable line with its UUID
      const summary = avatars
        .map((a) => `- **${a.name}** (${a.uuid}) — Style: ${a.style || "custom"}`)
        .join("n");

      return {
        content: [{
          type: "text" as const,
          text: avatars.length === 0
            ? "No avatars found. Create one in the PixelPanda Dashboard first."
            : `Found ${avatars.length} avatar(s):nn${summary}`,
        }],
      };
    } catch (error) {
      return {
        content: [{ type: "text" as const, text: `Failed: ${error}` }],
        isError: true,
      };
    }
  }
);

The structure of every tool response is the same: a content array with one or more text (or image) items, plus an optional isError flag. The AI reads the text and incorporates it into the conversation.

What you should see: When the AI calls this tool, it gets back something like:

Found 3 avatar(s):

- **Beach Model** (a1b2c3d4-...) — Style: lifestyle
- **Studio Pro** (e5f6g7h8-...) — Style: professional
- **Fitness Avatar** (i9j0k1l2-...) — Style: athletic

Panda Tip: The tool description matters more than you’d think. MCP clients show it to the AI model so it knows when to use the tool. A vague description (“does avatar stuff”) means the model guesses wrong. Be specific about what the tool returns and why you’d use it.


Step 7: Build the remove_background Tool

Now let’s do something destructive — well, destructive to backgrounds. This tool takes an image URL, submits it for processing, polls until it’s done, and returns the transparent PNG.

First, we define the tool with its input schema:

server.tool(
  "remove_background",
  "Remove the background from an image. Accepts a public image URL, "
    + "returns a transparent PNG. Costs 1 credit.",
  {
    // Zod schema — the AI sees this as the tool's input contract
    image_url: z.string().url().describe("Public URL of the image to process"),
  },
  async ({ image_url }) => {
    try {
      // Step 1: Fetch the image and convert to base64
      const imageResponse = await fetch(image_url);
      const imageBuffer = await imageResponse.arrayBuffer();
      const base64Image = Buffer.from(imageBuffer).toString("base64");

We need the image as base64 because that’s what the jobs endpoint expects. Next, submit the job:

      // Step 2: Submit to PixelPanda as a background removal job
      const jobResponse = await ppFetch("/jobs", {
        method: "POST",
        body: JSON.stringify({
          product_image: base64Image,
          product_filename: "image.png",
          category: "other",
          images_to_generate: 1,
          custom_prompt: "Remove background, transparent output",
        }),
      });
      const job = await jobResponse.json() as { job_id: string };

And finally, poll for the result:

      // Step 3: Wait for processing to complete
      const result = await pollJobStatus(job.job_id);

      if (result.status === "failed") {
        throw new Error(`Background removal failed: ${result.error}`);
      }

      const results = result.results as Array<{ url: string }>;

      return {
        content: [{
          type: "text" as const,
          text: `Background removed!nn**Result:** ${results[0]?.url}`,
        }],
      };
    } catch (error) {
      return {
        content: [{ type: "text" as const, text: `Failed: ${error}` }],
        isError: true,
      };
    }
  }
);

The pattern is always the same: fetch image, base64-encode, submit job, poll, return URL. Every generation tool you add to this server will follow this exact rhythm.

What you should see: The AI returns a CDN URL to your processed image with a transparent background. Paste it in a browser and you’ll see a clean cutout — no background, ready for compositing.


Step 8: Build the generate_product_photo Tool

This is the showpiece. It combines a saved avatar with a saved product to create AI marketing photography — the avatar holding, using, or interacting with your product in a photorealistic scene.

The input schema is richer here because the AI needs to know about UUIDs:

server.tool(
  "generate_product_photo",
  "Generate AI product marketing photos. An avatar holds or uses "
    + "your product in a professional scene. Requires avatar_uuid and "
    + "product_uuid (use list_avatars and list_products to find them). "
    + "Costs 1 credit per image.",
  {
    avatar_uuid: z.string().describe("UUID of a saved avatar"),
    product_uuid: z.string().describe("UUID of a saved product"),
    num_outputs: z.number().min(1).max(30).default(6)
      .describe("Number of photos to generate (default: 6)"),
    prompt: z.string().optional()
      .describe("Style direction, e.g. 'outdoor summer' or 'minimalist studio'"),
  },

Notice how the descriptions reference the other tools (“use list_avatars to find them”). This teaches the AI to chain tools together — it’ll call list_avatars first if it doesn’t have a UUID handy.

The handler submits the request and kicks off polling — same rhythm as remove_background:

  async ({ avatar_uuid, product_uuid, num_outputs, prompt }) => {
    try {
      // Submit the generation request
      const response = await ppFetch("/generate/product-photo", {
        method: "POST",
        body: JSON.stringify({
          avatar_uuid,
          product_uuid,
          num_outputs: num_outputs ?? 6,
          prompt: prompt ?? undefined,
        }),
      });

      const job = await response.json() as {
        job_id: string;
        credits_reserved: number;
        credits_remaining: number;
      };

      // Poll until all images are ready
      const result = await pollJobStatus(job.job_id);

Once polling completes, we format the results as a numbered list and report credit usage:

      if (result.status === "failed") {
        throw new Error(`Generation failed: ${result.error}`);
      }

      // Format each image as a numbered line with its scene label
      const images = result.results as Array<{ url: string; scene?: string }>;
      const list = images
        .map((r, i) => `${i + 1}. ${r.scene || "Photo"}: ${r.url}`)
        .join("n");

      return {
        content: [{
          type: "text" as const,
          text: `Generated ${images.length} photo(s):nn${list}nn`
            + `Credits used: ${job.credits_reserved}`,
        }],
      };
    } catch (error) {
      return {
        content: [{ type: "text" as const, text: `Failed: ${error}` }],
        isError: true,
      };
    }
  }
);

What you should see: The AI returns a numbered list of CDN URLs, each a photorealistic marketing shot of your avatar with your product. Something like:

Generated 6 photo(s):

1. Studio lighting: https://cdn.pixelpanda.ai/gen/abc123.png
2. Outdoor setting: https://cdn.pixelpanda.ai/gen/def456.png
3. Lifestyle scene: https://cdn.pixelpanda.ai/gen/ghi789.png
...

Credits used: 6

Panda Tip: The prompt field is where you steer the creative direction. “Scandinavian minimalist” and “vibrant tropical beach” produce dramatically different results from the same avatar and product. Experiment!


Step 9: Wire Up the Transport

The last piece. Connect the server to STDIO so MCP clients can spawn it:

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("PixelPanda MCP server running on STDIO");
}

main().catch((error) => {
  console.error("Fatal error:", error);
  process.exit(1);
});

Build the project:

npx tsc

What you should see: A dist/index.js file with no compilation errors. The server is ready to connect.


Testing Your Server

Configure Claude Code

Add the server to ~/.claude/claude_code_config.json:

{
  "mcpServers": {
    "pixelpanda": {
      "command": "node",
      "args": ["/absolute/path/to/pixelpanda-mcp/dist/index.js"],
      "env": {
        "PIXELPANDA_API_TOKEN": "pk_live_your_token_here"
      }
    }
  }
}

Restart Claude Code. Your three tools should appear in the available tools list.

Panda Tip: Use an absolute path in the args field. Relative paths break because MCP clients don’t necessarily launch from your project directory.

Try These Conversations

Here’s where the magic happens. Open Claude Code and talk to it naturally:

Browse your avatar library:

“What avatars do I have in PixelPanda?”

The AI calls list_avatars and shows you names, styles, and UUIDs.

Generate marketing photography:

“Generate 4 marketing photos of my ceramic vase with the studio-lighting avatar. Go for a warm, minimalist Scandinavian look.”

The AI finds the right UUIDs (calling list_avatars and list_products if needed), then fires generate_product_photo with your prompt. A few seconds later — four pixel-perfect product shots.

Remove a background:

“Remove the background from https://example.com/sneakers.jpg”

One call to remove_background, a short wait, and you get a clean transparent PNG.

Chain operations together:

“Take this product photo, remove the background, then generate 6 marketing shots with my fitness avatar.”

This is where MCP really shines. The AI orchestrates multi-step workflows — calling tools in sequence, passing results forward — all driven by your conversational intent.


Extending the Server

You’ve built a working server with three tools. The pattern is established, and adding more is straightforward. Here are the natural next tools to wire up:

Tool Endpoint What It Does
generate_tryon POST /generate/tryon Virtual try-on for clothing (avatar wearing the product)
generate_scenes POST /generate/scenes Product scenes without an avatar — great for furniture, food, electronics
list_products GET /products Browse your saved product library
upload_product POST /products Upload a new product (base64 image)
batch_process POST /batches Bulk catalog processing for high-volume workflows

Each one follows the exact same structure: define a name, write a description, declare a Zod input schema, and implement the async handler using ppFetch and pollJobStatus.

Beyond tools, you could add Resources (so the AI can passively browse your product catalog) or Prompts (reusable templates like “weekly product shoot” or “social media asset pack”).

The full PixelPanda API v2 documentation is at pixelpanda.ai.


Wrapping Up

The gap between “I need a marketing photo” and “here’s your marketing photo” just collapsed from fifteen clicks to one sentence.

That’s the real promise of MCP — not replacing your creative eye, but removing every tedious step between vision and pixel. Build the server, connect it to your AI client, and let the conversation drive the workflow.

Your images are waiting.


More MCP Server Guides

Building MCP servers for other workflows? Check out our companion guides:

Try PixelPanda

Remove backgrounds, upscale images, and create stunning product photos with AI.