This feature of MCP is not yet supported in the Claude Desktop client.
How sampling works
The sampling flow follows these steps:- Server sends a
sampling/createMessagerequest to the client - Client reviews the request and can modify it
- Client samples from an LLM
- Client reviews the completion
- Client returns the result to the server
Message format
Sampling requests use a standardized message format:Request parameters
Messages
Themessages array contains the conversation history to send to the LLM. Each message has:
role: Either “user” or “assistant”content: The message content, which can be:- Text content with a
textfield - Image content with
data(base64) andmimeTypefields
- Text content with a
Model preferences
ThemodelPreferences object allows servers to specify their model selection preferences:
-
hints: Array of model name suggestions that clients can use to select an appropriate model:name: String that can match full or partial model names (e.g. “claude-3”, “sonnet”)- Clients may map hints to equivalent models from different providers
- Multiple hints are evaluated in preference order
-
Priority values (0-1 normalized):
costPriority: Importance of minimizing costsspeedPriority: Importance of low latency responseintelligencePriority: Importance of advanced model capabilities
System prompt
An optionalsystemPrompt field allows servers to request a specific system prompt. The client may modify or ignore this.
Context inclusion
TheincludeContext parameter specifies what MCP context to include:
"none": No additional context"thisServer": Include context from the requesting server"allServers": Include context from all connected MCP servers
Sampling parameters
Fine-tune the LLM sampling with:temperature: Controls randomness (0.0 to 1.0)maxTokens: Maximum tokens to generatestopSequences: Array of sequences that stop generationmetadata: Additional provider-specific parameters
Response format
The client returns a completion result:Example request
Here’s an example of requesting sampling from a client:Best practices
When implementing sampling:- Always provide clear, well-structured prompts
- Handle both text and image content appropriately
- Set reasonable token limits
- Include relevant context through
includeContext - Validate responses before using them
- Handle errors gracefully
- Consider rate limiting sampling requests
- Document expected sampling behavior
- Test with various model parameters
- Monitor sampling costs
Human in the loop controls
Sampling is designed with human oversight in mind:For prompts
- Clients should show users the proposed prompt
- Users should be able to modify or reject prompts
- System prompts can be filtered or modified
- Context inclusion is controlled by the client
For completions
- Clients should show users the completion
- Users should be able to modify or reject completions
- Clients can filter or modify completions
- Users control which model is used
Security considerations
When implementing sampling:- Validate all message content
- Sanitize sensitive information
- Implement appropriate rate limits
- Monitor sampling usage
- Encrypt data in transit
- Handle user data privacy
- Audit sampling requests
- Control cost exposure
- Implement timeouts
- Handle model errors gracefully
Common patterns
Agentic workflows
Sampling enables agentic patterns like:- Reading and analyzing resources
- Making decisions based on context
- Generating structured data
- Handling multi-step tasks
- Providing interactive assistance
Context management
Best practices for context:- Request minimal necessary context
- Structure context clearly
- Handle context size limits
- Update context as needed
- Clean up stale context
Error handling
Robust error handling should:- Catch sampling failures
- Handle timeout errors
- Manage rate limits
- Validate responses
- Provide fallback behaviors
- Log errors appropriately
Limitations
Be aware of these limitations:- Sampling depends on client capabilities
- Users control sampling behavior
- Context size has limits
- Rate limits may apply
- Costs should be considered
- Model availability varies
- Response times vary
- Not all content types supported