Output Formats

The unified scrape endpoint lets you choose the main response body with the output field.

Available values

Output	Best for	Notes
`html`	raw parsers, archival, custom post-processing	Returns the richest raw content
`markdown`	RAG, LLM prompts, docs ingestion	Preserves headings, links, and basic structure
`text`	search indexing, classification, NLP	Flattens markup to plain readable text
`clean`	quick article extraction	Removes common page chrome and boilerplate

Example

curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "url": "https://www.python.org",
    "output": "markdown"
  }'

Choosing the right format

Use `html` when

You want to run your own parser or BeautifulSoup pipeline
You need attributes, embedded markup, or full DOM fidelity
You are debugging the page before deciding on extraction rules

Use `markdown` when

You are feeding content into LLMs or RAG systems
You want cleaner structure without raw HTML noise
You need links and headings preserved in a compact format

Use `text` when

You only care about readable body text
You are doing keyword extraction, search, or classification
You want the smallest payload for simple analysis

Use `clean` when

You want a readable article-style body quickly
You are stripping navigation, footer content, or other repetitive chrome
You want something in between raw HTML and plain text

Tip

If you are unsure, start with markdown. It is usually the most practical default for developer and AI workflows.

Output Formats

Output Formats

Available values

Example

Choosing the right format

Use html when

Use markdown when

Use text when

Use clean when

Tip

Use `html` when

Use `markdown` when

Use `text` when

Use `clean` when