Output Formats

Output Formats

The unified scrape endpoint lets you choose the main response body with the output field.

Available values

Output Best for Notes
html raw parsers, archival, custom post-processing Returns the richest raw content
markdown RAG, LLM prompts, docs ingestion Preserves headings, links, and basic structure
text search indexing, classification, NLP Flattens markup to plain readable text
clean quick article extraction Removes common page chrome and boilerplate

Example

curl -X POST "https://scrape.toolkitapi.io/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
    "url": "https://www.python.org",
    "output": "markdown"
  }'

Choosing the right format

Use html when

  • You want to run your own parser or BeautifulSoup pipeline
  • You need attributes, embedded markup, or full DOM fidelity
  • You are debugging the page before deciding on extraction rules

Use markdown when

  • You are feeding content into LLMs or RAG systems
  • You want cleaner structure without raw HTML noise
  • You need links and headings preserved in a compact format

Use text when

  • You only care about readable body text
  • You are doing keyword extraction, search, or classification
  • You want the smallest payload for simple analysis

Use clean when

  • You want a readable article-style body quickly
  • You are stripping navigation, footer content, or other repetitive chrome
  • You want something in between raw HTML and plain text

Tip

If you are unsure, start with markdown. It is usually the most practical default for developer and AI workflows.