Page Text Extractor
Return readable page text or scoped content via the unified scrape endpoint
POST
/v1/scrape
Description
Return readable page text or scoped content via the unified scrape endpoint
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| x-api-key | string | optional |
How to Use
1
1. Send `url` and `output: "text"` to `/v1/scrape`. 2. Optionally set `selector` to narrow the extraction target. 3. Use `include_links` if you want hyperlinks preserved in text mode. 4. Read the extracted text from the `content` field.
About This Tool
Page Text Extractor is now handled through the unified scrape endpoint. Set `output` to `text` to return readable plain text from a page, or combine it with `selector` to scope the extraction to a particular region such as `main`, `article`, or `#content`.
This is useful when you want search- or NLP-friendly content without Markdown or raw HTML.
Why Use This Tool
- Full-page indexing — Build search indexes from plain text
- NLP workflows — Feed readable content into classification or sentiment pipelines
- Scoped extraction — Focus on a page section rather than the whole body
- Change monitoring — Compare text output between site versions
Frequently Asked Questions
How is this different from Markdown mode?
Text mode removes most formatting and is flatter. Markdown mode preserves more structure.
How is this different from clean mode?
`clean` aims for readable content with some structure preserved, while `text` is better for plain-text analysis.
Can I limit it to one area of the page?
Yes — use the `selector` field to scope extraction to a specific CSS selector.
Start using Page Text Extractor now
Get your free API key and make your first request in under a minute.