Quick Start

Quick Start

The Web Scraping Toolkit is built around one flexible endpoint: POST /v1/scrape. In a single request you can fetch a page, optionally render JavaScript, choose an output format, and include structured sections like links, images, metadata, selectors, and SEO signals.

If you are coming from tools like ScrapingBee, this page should feel familiar — but with more structured JSON, cleaner LLM-ready content, and fewer follow-up requests for common metadata tasks.

Why teams like this toolkit

  • One call, richer output — content, links, metadata, and selector extraction can come back in the same response
  • Built for AI pipelines — Markdown, text, clean content, chunking, and schema-driven extraction are first-class features
  • Less infrastructure to maintain — use server-side rendering and crawl jobs without managing your own browser farm
  • SEO and scrape in one product — page speed, keyword density, broken links, and audits are already included

1. Get your API key

Sign up and subscribe via RapidAPI, then pass your key in the X-API-Key header.

2. Make your first request

Start with the unified endpoint and request clean Markdown or text.

curl

curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "url": "https://toolkitapi.io",
    "output": "markdown"
  }'

Python

import httpx

response = httpx.post(
    "https://scrape.toolkitapi.io/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY"},
    json={
        "url": "https://toolkitapi.io",
        "output": "markdown",
    },
    timeout=60,
)
print(response.json())

JavaScript

const response = await fetch("https://scrape.toolkitapi.io/v1/scrape", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key": "YOUR_KEY"
  },
  body: JSON.stringify({
    url: "https://toolkitapi.io",
    output: "markdown"
  })
});

const data = await response.json();
console.log(data);

3. Understand the output formats

Use the output field to control the main content returned:

Output Best for What you get
html raw scraping pipelines Original HTML body/source for parsing
markdown LLMs, RAG, docs ingestion Clean Markdown with structure preserved
text search, NLP, lightweight extraction Human-readable plain text
clean article-like content extraction Boilerplate-reduced readable content

4. Add structured extraction in the same call

The big advantage of the unified endpoint is that you do not need separate requests for common metadata.

curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "url": "https://www.python.org",
    "output": "clean",
    "extract": {
      "links": true,
      "images": true,
      "meta_tags": true,
      "link_preview": true,
      "selectors": {
        "headline": "h1",
        "nav_links": {"selector": "nav a", "attr": "href", "multiple": true}
      }
    }
  }'

Example response shape:

{
  "url": "https://www.python.org/",
  "status_code": 200,
  "content": "Welcome to Python.org...",
  "output_format": "clean",
  "word_count": 220,
  "links": { "total": 130, "internal": 123, "external": 7 },
  "meta_tags": { "title": "Welcome to Python.org" },
  "selectors": {
    "headline": "Welcome to Python.org",
    "nav_links": ["/downloads/", "/doc/"]
  }
}

5. Render JavaScript when needed

For SPAs or client-side rendered pages, turn on render_js and optionally use wait_for, wait_until, scroll, or block_resources.

{
  "url": "https://quotes.toscrape.com/js/",
  "render_js": true,
  "wait_until": "networkidle",
  "output": "text",
  "block_resources": ["image", "font"],
  "stealth": true,
  "extract": {
    "selectors": {
      "quotes": { "selector": ".quote", "multiple": true }
    }
  }
}

6. Use specialised endpoints for deeper workflows

The unified scrape endpoint covers most cases, but the toolkit also exposes targeted endpoints:

  • GET /v1/scrape/sitemap — parse sitemap XML or sitemap indexes
  • GET /v1/scrape/robots — inspect robots.txt rules and sitemaps
  • POST /v1/scrape/pdf — extract text from remote PDFs
  • POST /v1/scrape/crawl — launch async same-domain crawls
  • GET /v1/scrape/audit — full SEO audits
  • GET /v1/scrape/pagespeed — response size, compression, and TTFB checks

7. Deep dives

If this page feels too broad, use the guide pages:

8. Next steps