Quick Start

The Web Scraping Toolkit is built around one flexible endpoint: POST /v1/scrape. In a single request you can fetch a page, optionally render JavaScript, choose an output format, and include structured sections like links, images, metadata, selectors, and SEO signals.

If you are coming from tools like ScrapingBee, this page should feel familiar — but with more structured JSON, cleaner LLM-ready content, and fewer follow-up requests for common metadata tasks.

Why teams like this toolkit

One call, richer output — content, links, metadata, and selector extraction can come back in the same response
Built for AI pipelines — Markdown, text, clean content, chunking, and schema-driven extraction are first-class features
Less infrastructure to maintain — use server-side rendering and crawl jobs without managing your own browser farm
SEO and scrape in one product — page speed, keyword density, broken links, and audits are already included

1. Get your API key

2. Make your first request

Start with the unified endpoint and request clean Markdown or text.

curl

curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "url": "https://toolkitapi.io",
    "output": "markdown"
  }'

Python

import httpx

response = httpx.post(
    "https://scrape.toolkitapi.io/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://toolkitapi.io",
        "output": "markdown",
    },
    timeout=60,
)
print(response.json())

JavaScript

const response = await fetch("https://scrape.toolkitapi.io/v1/scrape", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key": "YOUR_KEY"
  },
  body: JSON.stringify({
    url: "https://toolkitapi.io",
    output: "markdown"
  })
});

const data = await response.json();
console.log(data);

3. Understand the output formats

Use the output field to control the main content returned:

Output	Best for	What you get
`html`	raw scraping pipelines	Original HTML body/source for parsing
`markdown`	LLMs, RAG, docs ingestion	Clean Markdown with structure preserved
`text`	search, NLP, lightweight extraction	Human-readable plain text
`clean`	article-like content extraction	Boilerplate-reduced readable content

4. Add structured extraction in the same call

The big advantage of the unified endpoint is that you do not need separate requests for common metadata.

curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "url": "https://www.python.org",
    "output": "clean",
    "extract": {
      "links": true,
      "images": true,
      "meta_tags": true,
      "link_preview": true,
      "selectors": {
        "headline": "h1",
        "nav_links": {"selector": "nav a", "attr": "href", "multiple": true}
      }
    }
  }'

Example response shape:

{
  "url": "https://www.python.org/",
  "status_code": 200,
  "content": "Welcome to Python.org...",
  "output_format": "clean",
  "word_count": 220,
  "links": { "total": 130, "internal": 123, "external": 7 },
  "meta_tags": { "title": "Welcome to Python.org" },
  "selectors": {
    "headline": "Welcome to Python.org",
    "nav_links": ["/downloads/", "/doc/"]
  }
}

5. Render JavaScript when needed

For SPAs or client-side rendered pages, turn on render_js and optionally use wait_for, wait_until, scroll, or block_resources.

{
  "url": "https://quotes.toscrape.com/js/",
  "render_js": true,
  "wait_until": "networkidle",
  "output": "text",
  "block_resources": ["image", "font"],
  "stealth": true,
  "extract": {
    "selectors": {
      "quotes": { "selector": ".quote", "multiple": true }
    }
  }
}

6. Use specialised endpoints for deeper workflows

The unified scrape endpoint covers most cases, but the toolkit also exposes targeted endpoints:

GET /v1/scrape/sitemap — parse sitemap XML or sitemap indexes
GET /v1/scrape/robots — inspect robots.txt rules and sitemaps
POST /v1/scrape/pdf — extract text from remote PDFs
POST /v1/scrape/crawl — launch async same-domain crawls
GET /v1/scrape/audit — full SEO audits
GET /v1/scrape/pagespeed — response size, compression, and TTFB checks

7. Deep dives

If this page feels too broad, use the guide pages:

8. Next steps

Browse the full API documentation
Explore individual tool pages
Review pricing