Quick Start
Quick Start
The Web Scraping Toolkit is built around one flexible endpoint: POST /v1/scrape. In a single request you can fetch a page, optionally render JavaScript, choose an output format, and include structured sections like links, images, metadata, selectors, and SEO signals.
If you are coming from tools like ScrapingBee, this page should feel familiar — but with more structured JSON, cleaner LLM-ready content, and fewer follow-up requests for common metadata tasks.
Why teams like this toolkit
- One call, richer output — content, links, metadata, and selector extraction can come back in the same response
- Built for AI pipelines — Markdown, text, clean content, chunking, and schema-driven extraction are first-class features
- Less infrastructure to maintain — use server-side rendering and crawl jobs without managing your own browser farm
- SEO and scrape in one product — page speed, keyword density, broken links, and audits are already included
1. Get your API key
Sign up and subscribe via RapidAPI, then pass your key in the X-API-Key header.
2. Make your first request
Start with the unified endpoint and request clean Markdown or text.
curl
curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"url": "https://toolkitapi.io",
"output": "markdown"
}'
Python
import httpx
response = httpx.post(
"https://scrape.toolkitapi.io/v1/scrape",
headers={"X-API-Key": "YOUR_KEY"},
json={
"url": "https://toolkitapi.io",
"output": "markdown",
},
timeout=60,
)
print(response.json())
JavaScript
const response = await fetch("https://scrape.toolkitapi.io/v1/scrape", {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "YOUR_KEY"
},
body: JSON.stringify({
url: "https://toolkitapi.io",
output: "markdown"
})
});
const data = await response.json();
console.log(data);
3. Understand the output formats
Use the output field to control the main content returned:
| Output | Best for | What you get |
|---|---|---|
html |
raw scraping pipelines | Original HTML body/source for parsing |
markdown |
LLMs, RAG, docs ingestion | Clean Markdown with structure preserved |
text |
search, NLP, lightweight extraction | Human-readable plain text |
clean |
article-like content extraction | Boilerplate-reduced readable content |
4. Add structured extraction in the same call
The big advantage of the unified endpoint is that you do not need separate requests for common metadata.
curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"url": "https://www.python.org",
"output": "clean",
"extract": {
"links": true,
"images": true,
"meta_tags": true,
"link_preview": true,
"selectors": {
"headline": "h1",
"nav_links": {"selector": "nav a", "attr": "href", "multiple": true}
}
}
}'
Example response shape:
{
"url": "https://www.python.org/",
"status_code": 200,
"content": "Welcome to Python.org...",
"output_format": "clean",
"word_count": 220,
"links": { "total": 130, "internal": 123, "external": 7 },
"meta_tags": { "title": "Welcome to Python.org" },
"selectors": {
"headline": "Welcome to Python.org",
"nav_links": ["/downloads/", "/doc/"]
}
}
5. Render JavaScript when needed
For SPAs or client-side rendered pages, turn on render_js and optionally use wait_for, wait_until, scroll, or block_resources.
{
"url": "https://quotes.toscrape.com/js/",
"render_js": true,
"wait_until": "networkidle",
"output": "text",
"block_resources": ["image", "font"],
"stealth": true,
"extract": {
"selectors": {
"quotes": { "selector": ".quote", "multiple": true }
}
}
}
6. Use specialised endpoints for deeper workflows
The unified scrape endpoint covers most cases, but the toolkit also exposes targeted endpoints:
GET /v1/scrape/sitemap— parse sitemap XML or sitemap indexesGET /v1/scrape/robots— inspectrobots.txtrules and sitemapsPOST /v1/scrape/pdf— extract text from remote PDFsPOST /v1/scrape/crawl— launch async same-domain crawlsGET /v1/scrape/audit— full SEO auditsGET /v1/scrape/pagespeed— response size, compression, and TTFB checks
7. Deep dives
If this page feels too broad, use the guide pages:
8. Next steps
- Browse the full API documentation
- Explore individual tool pages
- Review pricing