The Program Integrity Alliance (PIA) aims to make working with U.S. Government datasets easier and AI-friendly. We have ingested hundreds of thousands of documents and articles across a range of sources, and this list is growing. This MCP server enables AIs to search this data at a more detailed level than on most source websites, for example, searching within PDF reports to find the exact pages where text and images appear.
Full attribution is given to the amazing open federal data sources, and all links in the data provided by PIA will always direct back to the original source.
Currently, the list of datasets includes:
- U.S. Government Accountability Office (GAO) - 10k Federal Reports since 2010 and 5.5k Open Oversight Recommendations
- Oversight.gov - 28k OIG Federal Reports since 2010, and 29k Open Oversight Recommendations
- U.S. Congress - Bill texts for sessions 118 and 119
- Department of Justice (DOJ) - 195k Press Releases since 2000
- Federal Agency annual reports - Congressional Justification, Financial Report, Performance Report - 139 reports across 10 priority agencies, with best coverage in 2024.
This data is updated weekly, and we will be adding more datasets and tools soon.
If you have any questions, or requests for other datasets, we look forward to hearing from you by raising an issue here.
π€ Contribute β’ π Report Bugs or Questions
- π Document Search: Query PIA database with comprehensive OData filtering options
- π Faceted Search: Discover available filter fields and values
- π AI Instruction Prompts: Prompts that instruct LLMs on how to summarize search results and use search tools
- Got to https://mcp.programintegrity.org/register
- Enter your email and a few quick details
- You should automatically receive your key
Note: This is pending PR review to be accepted to catalog
- Download and run the latest version of Docker Desktop
- Navigate to 'MCP Toolkit'
- Search for 'Program Integrity Alliance'
- Add as a server by clicking '+'
- Under 'Configuration' enter your key
- In 'MCP Toolkit' navigate to 'Clients'
- Choose one, eg 'Claude Desktop'
- Start your Client
- You should now see 'pia_search_content' and other tools
To install PIA Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install pia-mcp-server --client claude
Install using uv:
uv tool install pia-mcp-server
For development:
# Clone and set up development environment
git clone https://github.com/Program-Integrity-Alliance/pia-mcp-local.git
cd pia-mcp-local
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Install with test dependencies
uv pip install -e ".[test]"
For Docker:
# Build the Docker image if you want to use a local image
git clone https://github.com/Program-Integrity-Alliance/pia-mcp-local.git
cd pia-mcp-local
docker build -t pia-mcp-server:latest .
Add this configuration to your MCP client config file:
{
"mcpServers": {
"pia-mcp-server": {
"command": "uv",
"args": [
"tool",
"run",
"pia-mcp-server",
"--api-key", "YOUR_API_KEY"
]
}
}
}
For Docker:
{
"mcpServers": {
"pia-mcp-server": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"pia-mcp-server:latest",
"--api-key", "YOUR_API_KEY"
]
}
}
}
The server provides four main tools for searching the Program Integrity Alliance (PIA) database:
Purpose: Comprehensive search tool for querying document content and recommendations in the PIA database.
Description: Returns comprehensive results with full citation information and clickable links for proper attribution. Each result includes corresponding citations with data source attribution (GAO, OIG, etc.). Supports complex OData filtering with boolean logic, operators, and grouping.
Parameters:
query
(required): Search query textfilter
(optional): OData filter expression supporting complex boolean logicpage
(optional): Page number (1-based, default: 1)page_size
(optional): Number of results per page (max 50, default: 10)search_mode
(optional): Search mode - "content" for full-text search or "titles" for title-only search (default: "content")limit
(optional): Alternative name for page_size (for compatibility)include_facets
(optional): Whether to include facets in response (default: false to reduce token usage)
Purpose: Get available facets (filter values) for the PIA database content search.
Description: This can help understand what filter values are available before performing content searches. Supports complex OData filtering with boolean logic, operators, and grouping.
Parameters:
query
(optional): Optional query to get facets for (if empty, gets all facets, default: "")filter
(optional): Optional OData filter expression
Purpose: Search the Program Integrity Alliance (PIA) database for document titles only.
Description: Returns document titles and metadata without searching the full content. Useful for finding specific documents by title or discovering available documents. Supports complex OData filtering with boolean logic, operators, and grouping.
Parameters:
query
(required): Search query text (searches document titles only)filter
(optional): OData filter expression supporting complex boolean logicpage
(optional): Page number (1-based, default: 1)page_size
(optional): Number of results per page (max 50, default: 10)limit
(optional): Alternative name for page_size (for compatibility)include_facets
(optional): Whether to include facets in response (default: false to reduce token usage)
Purpose: Get available facets (filter values) for the PIA database title search.
Description: This can help understand what filter values are available before performing title searches. Supports complex OData filtering with boolean logic, operators, and grouping.
Parameters:
query
(optional): Optional query to get facets for (if empty, gets all facets, default: "")filter
(optional): Optional OData filter expression
Comprehensive search with OData filtering and faceting. The filter
parameter uses standard OData query syntax.
- Content Search (
pia_search_content
): Searches within document content and recommendations for comprehensive results - Title Search (
pia_search_titles
): Searches document titles only - faster and useful for document discovery
Example Filter Expressions:
- Basic filter:
"SourceDocumentDataSource eq 'GAO'"
- Multiple conditions:
"SourceDocumentDataSource eq 'GAO' or SourceDocumentDataSource eq 'OIG'"
- Complex grouping:
"SourceDocumentDataSource eq 'GAO' and RecStatus ne 'Closed'"
- Negation:
"SourceDocumentDataSource ne 'Department of Justice' and not (RecStatus eq 'Closed')"
- List membership:
"IsIntegrityRelated eq 'Yes' and RecPriorityFlag in ('High', 'Critical')"
- Date ranges:
"SourceDocumentPublishDate ge '2020-01-01' and SourceDocumentPublishDate le '2024-12-31'"
- Boolean grouping:
"(SourceDocumentDataSource eq 'GAO' or SourceDocumentDataSource eq 'OIG') and RecStatus eq 'Open'"
OData Filter Operators:
eq
- equals:field eq 'value'
ne
- not equals:field ne 'value'
gt
- greater than:amount gt 1000
ge
- greater than or equal:date ge '2023-01-01'
lt
- less than:amount lt 5000
le
- less than or equal:date le '2023-12-31'
in
- value in list:status in ('Active', 'Pending')
OData Logical Operators:
and
- logical AND:field1 eq 'value' and field2 gt 100
or
- logical OR:status eq 'Active' or status eq 'Pending'
not
- logical NOT:not (status eq 'Inactive')
()
- grouping:(field1 eq 'A' or field1 eq 'B') and field2 gt 0
OData String Functions:
contains(field, 'text')
- field contains textstartswith(field, 'prefix')
- field starts with prefixendswith(field, 'suffix')
- field ends with suffix
Discover available field names and values for filtering.
Tool Name: pia_search_facets
Parameters:
query
(optional): Optional query to get facets for (default: "")
Purpose:
- Discover available field names (e.g.,
data_source
,document_type
,agency
) - Find possible field values (e.g., "OIG", "GAO", "audit_report")
- Understand data types for each field (string, date, number)
This information helps you construct proper filter
expressions for the search tools.
To effectively use OData filters, follow this workflow:
Use the pia_search_facets
tool to explore what fields are available for filtering. You can provide a query to get facets relevant to your search topic, or omit the query to see all available fields.
The facets response will show available fields and their possible values:
{
"SourceDocumentDataSource": ["OIG", "GAO", "CMS", "FBI"],
"RecStatus": ["Open", "Closed", "In Progress"],
"RecPriorityFlag": ["High", "Medium", "Low", "Critical"],
"IsIntegrityRelated": ["Yes", "No"],
"SourceDocumentPublishDate": "2020-01-01 to 2024-12-31"
}
Use the pia_search
tool with discovered fields to create precise OData filters:
Basic Example:
Query: "Medicare fraud"
Filter: "SourceDocumentDataSource eq 'GAO' and SourceDocumentPublishDate ge '2023-01-01' and IsIntegrityRelated eq 'Yes'"
Complex Example:
Query: "healthcare violations"
Filter: "(SourceDocumentDataSource eq 'OIG' or SourceDocumentDataSource eq 'CMS') and RecPriorityFlag in ('High', 'Critical') and SourceDocumentPublishDate ge '2023-01-01'"
The server provides prompts that instruct the calling LLM on how to effectively use PIA tools and format responses:
Provides guidance on how to summarize information from PIA search results with proper citations.
Prompt Name: summarization_guidance
Purpose: Ensures LLM creates fact-based summaries with inline citations and proper reference formatting
Arguments: None (reusable guidance)
Returns: Comprehensive instructions that guide the LLM to:
- Only include facts that appear in the provided search results (no prior knowledge)
- Use proper inline citation format [n] for every factual statement
- Create a References section with format: [n] Document Title β Page X β Source Name β URL
- Follow objective, factual style guidelines without speculation or filler
- Include all necessary attribution elements exactly as provided in search results
- Organize information logically and ensure every fact has supporting citations
Provides guidance on how to perform PIA searches with or without filters.
Prompt Name: search_guidance
Purpose: Guides LLM through proper search workflow including filter discovery and OData syntax for all four search tools
Arguments: None (reusable guidance)
Returns: Comprehensive instructions that guide the LLM to:
- Run unfiltered searches by default unless filter criteria are mentioned
- Choose between content search (comprehensive) and title search (fast discovery)
- Use
pia_search_content_facets
orpia_search_titles_facets
to discover available filter fields and values - Build valid OData filter expressions with correct syntax and actual field names
- Apply proper OData operators:
eq
,ne
,gt
,ge
,lt
,le
,and
,or
- Fall back to unfiltered search when filtered search returns no results
- Validate all filter fields against available facets before use
The API key is always provided via the MCP server configuration. Additional settings can be configured through environment variables:
Variable | Purpose | Default |
---|---|---|
PIA_API_URL |
PIA API endpoint | https://mcp.programintegrity.org/ |
REQUEST_TIMEOUT |
API request timeout (seconds) | 60 |
MAX_RESULTS |
Maximum results per query | 50 |
The API key must be provided in your MCP client configuration using the --api-key
argument. Contact the Program Integrity Alliance to obtain your API key.
{
"mcpServers": {
"pia-mcp-server": {
"command": "pia-mcp-server",
"args": ["--api-key", "YOUR_API_KEY"]
}
}
}
Replace YOUR_API_KEY
with your actual PIA API key.
Run the test suite:
python -m pytest
Run with coverage:
python -m pytest --cov=pia_mcp_server
Released under the MIT License. See the LICENSE file for details.
Made with β€οΈ for Government Transparency and Accountability