Output Formats
Output Formats
The unified scrape endpoint lets you choose the main response body with the output field.
Available values
| Output | Best for | Notes |
|---|---|---|
html |
raw parsers, archival, custom post-processing | Returns the richest raw content |
markdown |
RAG, LLM prompts, docs ingestion | Preserves headings, links, and basic structure |
text |
search indexing, classification, NLP | Flattens markup to plain readable text |
clean |
quick article extraction | Removes common page chrome and boilerplate |
Example
curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"url": "https://www.python.org",
"output": "markdown"
}'
Choosing the right format
Use html when
- You want to run your own parser or BeautifulSoup pipeline
- You need attributes, embedded markup, or full DOM fidelity
- You are debugging the page before deciding on extraction rules
Use markdown when
- You are feeding content into LLMs or RAG systems
- You want cleaner structure without raw HTML noise
- You need links and headings preserved in a compact format
Use text when
- You only care about readable body text
- You are doing keyword extraction, search, or classification
- You want the smallest payload for simple analysis
Use clean when
- You want a readable article-style body quickly
- You are stripping navigation, footer content, or other repetitive chrome
- You want something in between raw HTML and plain text
Tip
If you are unsure, start with markdown. It is usually the most practical default for developer and AI workflows.