scrape-ai@payload ~ making websites AI-friendly.

Content that
AI understands.

Auto-generates llms.txt, structured markdown, JSON-LD, and a context search API from any Payload CMS website.

[

6

endpoints

]
[

3

stages

]
[

0

config needed

]
[

v3

Payload

]
$ syncing pages collection OK · $ generating llms.txt OK · $ extracting rich text to markdown OK · $ building JSON-LD structured data OK · $ running AI enrichment PASS · $ rebuilding sitemap.json OK · $ context query scoring OK · $ initial sync 41/41 docs COMPLETE · $ syncing pages collection OK · $ generating llms.txt OK · $ extracting rich text to markdown OK · $ building JSON-LD structured data OK · $ running AI enrichment PASS · $ rebuilding sitemap.json OK · $ context query scoring OK · $ initial sync 41/41 docs COMPLETE · 
scroll to explore
Features

SEO, but for AI.

AI agents can't parse your HTML efficiently. This plugin auto-generates clean, structured content in every format they understand.

llms.txt Standard

Generates /llms.txt and /llms-full.txt following the official spec.

Auto-Sync

Every content change triggers instant regeneration. Hybrid event + queue architecture handles bulk edits.

AI Enrichment

Optional AI summaries, entity extraction, topic classification, and semantic chunking for RAG pipelines.

{}

JSON-LD

Schema.org structured data for every page. Products, articles, services — automatically mapped.

Smart Detection

Auto-discovers content collections. Pages, posts, products detected. Toggle from the dashboard.

$

Token Estimator

Estimates total AI cost before you enable enrichment. Recommends the cheapest model that fits.

Architecture

Three-Stage Pipeline

STAGE 01

Extract

Lexical & Slate → Markdown

FREE
STAGE 02

Structure

Frontmatter, JSON-LD, hierarchy

FREE
STAGE 03

Enrich

AI summaries, topics, chunks

OPTIONAL
Admin Panel

Full Control Center

A dedicated dashboard inside your Payload admin panel.

Dashboard Overview
Content Preview
Token Estimation
API

Public Endpoints

All endpoints are public by design. Rate-limited per IP.

EndpointTypeDescription
GET /llms.txttext/markdownCurated content index — start here
GET /llms-full.txttext/markdownComplete content listing
GET /ai/:collection/:slug.mdtext/markdownIndividual page markdown
GET /ai/sitemap.jsonapplication/jsonContent graph & hierarchy
GET /ai/structured/:collection/:slug.jsonapplication/jsonJSON-LD structured data
GET /ai/context?query=...&limit=5application/jsonRelevance-scored search
GET /.well-known/ai-plugin.jsonapplication/jsonDiscovery manifest
Get Started

Installation

1. Install the package

npm# From GitHub (current) npm install github:thorzambo/payload-plugin-scrape-ai # From npm (coming soon) # npm install @thorzambo/payload-plugin-scrape-ai

2. Add to your plugins array in payload.config.ts

// payload.config.ts (or src/plugins/index.ts if using a separate file) import { scrapeAiPlugin } from 'payload-plugin-scrape-ai' export default buildConfig({ plugins: [ scrapeAiPlugin({ siteUrl: 'https://your-website.com', // required siteName: 'My Website', siteDescription: 'What this site is about', collections: ['pages', 'posts', 'products'], // optional exclude: ['users', 'media'], // optional }), // ... your other plugins ], })

3. Enable root-level AI discoverability

// next.config.mjs import { withScrapeAi } from 'payload-plugin-scrape-ai/next' export default withScrapeAi(nextConfig)

Maps /llms.txt and /ai/* to the plugin endpoints. Without this, AI agents can't find your content.

4. Add in-page discovery to your layout (10/10 score)

// app/layout.tsx import { ScrapeAiMeta, ScrapeAiFooterTag } from 'payload-plugin-scrape-ai/discovery' <head> <ScrapeAiMeta siteUrl="https://your-site.com" /> </head> <body> {children} <ScrapeAiFooterTag siteUrl="https://your-site.com" /> </body>

Invisible to humans, visible to AI text extractors. This is what makes ChatGPT and Perplexity discover your content instantly.

5. Restart and visit the dashboard

Go to /admin/scrape-ai in your Payload admin panel.

NOTE

Like all Payload plugins, this does not auto-register on install. You must manually add scrapeAiPlugin() to your plugins array in payload.config.ts and restart.

Open Source

Contributing

Fork it, improve it, submit a PR. All changes go through review and approval.

How to Contribute

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a Pull Request
  5. Wait for review & approval

Good First Issues

  • enhancement Better Slate editor support
  • enhancement Custom JSON-LD type mappings
  • docs More usage examples
  • feature MCP server integration

Built with care.
Maintained with coffee.

If this plugin saves you time, consider supporting its development.