scrape-ai@payload ~ making websites AI-friendly.

Content that
AI understands.

Auto-generates llms.txt, structured markdown, JSON-LD, and a context search API from any Payload CMS website.

Install Now View on GitHub → Buy Me a Coffee

[

endpoints

]

[

stages

]

[

config needed

]

[

Payload

]

$ syncing pages collection OK · $ generating llms.txt OK · $ extracting rich text to markdown OK · $ building JSON-LD structured data OK · $ running AI enrichment PASS · $ rebuilding sitemap.json OK · $ context query scoring OK · $ initial sync 41/41 docs COMPLETE · $ syncing pages collection OK · $ generating llms.txt OK · $ extracting rich text to markdown OK · $ building JSON-LD structured data OK · $ running AI enrichment PASS · $ rebuilding sitemap.json OK · $ context query scoring OK · $ initial sync 41/41 docs COMPLETE ·

scroll to explore

Features

SEO, but for AI.

AI agents can't parse your HTML efficiently. This plugin auto-generates clean, structured content in every format they understand.

☰

llms.txt Standard

Generates /llms.txt and /llms-full.txt following the official spec.

↻

Auto-Sync

Every content change triggers instant regeneration. Hybrid event + queue architecture handles bulk edits.

⚙

AI Enrichment

Optional AI summaries, entity extraction, topic classification, and semantic chunking for RAG pipelines.

{}

JSON-LD

Schema.org structured data for every page. Products, articles, services — automatically mapped.

★

Smart Detection

Auto-discovers content collections. Pages, posts, products detected. Toggle from the dashboard.

Token Estimator

Estimates total AI cost before you enable enrichment. Recommends the cheapest model that fits.

Architecture

Three-Stage Pipeline

STAGE 01

Extract

Lexical & Slate → Markdown

FREE

→

STAGE 02

Structure

Frontmatter, JSON-LD, hierarchy

FREE

→

STAGE 03

Enrich

AI summaries, topics, chunks

OPTIONAL

Admin Panel

Full Control Center

A dedicated dashboard inside your Payload admin panel.

API

Public Endpoints

All endpoints are public by design. Rate-limited per IP.

Endpoint	Type	Description
`GET /llms.txt`	text/markdown	Curated content index — start here
`GET /llms-full.txt`	text/markdown	Complete content listing
`GET /ai/:collection/:slug.md`	text/markdown	Individual page markdown
`GET /ai/sitemap.json`	application/json	Content graph & hierarchy
`GET /ai/structured/:collection/:slug.json`	application/json	JSON-LD structured data
`GET /ai/context?query=...&limit=5`	application/json	Relevance-scored search
`GET /.well-known/ai-plugin.json`	application/json	Discovery manifest

Get Started

Installation

1. Install the package

npm# From GitHub (current)
npm install github:thorzambo/payload-plugin-scrape-ai

# From npm (coming soon)
# npm install @thorzambo/payload-plugin-scrape-ai

2. Add to your plugins array in payload.config.ts

// payload.config.ts (or src/plugins/index.ts if using a separate file)
import { scrapeAiPlugin } from 'payload-plugin-scrape-ai'

export default buildConfig({
  plugins: [
    scrapeAiPlugin({
      siteUrl: 'https://your-website.com',  // required
      siteName: 'My Website',
      siteDescription: 'What this site is about',
      collections: ['pages', 'posts', 'products'], // optional
      exclude: ['users', 'media'],                  // optional
    }),
    // ... your other plugins
  ],
})

3. Enable root-level AI discoverability

// next.config.mjs
import { withScrapeAi } from 'payload-plugin-scrape-ai/next'

export default withScrapeAi(nextConfig)

Maps /llms.txt and /ai/* to the plugin endpoints. Without this, AI agents can't find your content.

4. Add in-page discovery to your layout (10/10 score)

// app/layout.tsx
import { ScrapeAiMeta, ScrapeAiFooterTag } from 'payload-plugin-scrape-ai/discovery'

<head>
  <ScrapeAiMeta siteUrl="https://your-site.com" />
</head>
<body>
  {children}
  <ScrapeAiFooterTag siteUrl="https://your-site.com" />
</body>

Invisible to humans, visible to AI text extractors. This is what makes ChatGPT and Perplexity discover your content instantly.

5. Restart and visit the dashboard

Go to /admin/scrape-ai in your Payload admin panel.

NOTE

Like all Payload plugins, this does not auto-register on install. You must manually add scrapeAiPlugin() to your plugins array in payload.config.ts and restart.

Open Source

Contributing

Fork it, improve it, submit a PR. All changes go through review and approval.

How to Contribute

Fork the repository
Create a feature branch
Make your changes
Submit a Pull Request
Wait for review & approval

Good First Issues

enhancement Better Slate editor support
enhancement Custom JSON-LD type mappings
docs More usage examples
feature MCP server integration

Built with care.
Maintained with coffee.

If this plugin saves you time, consider supporting its development.

Sponsor on GitHub Buy Me a Coffee Report an Issue

Content thatAI understands.