Rendering, Waits, and Proxies

Rendering, Waits, and Proxies

Many modern sites load content after the initial HTML response. The Scrape toolkit includes built-in browser rendering and timing controls so you can fetch fully-populated pages when plain HTTP is not enough.

Core options

Field Purpose
render_js Use a headless browser to execute JavaScript
wait_for Wait for a CSS selector before extracting
wait_until Choose load state: load, domcontentloaded, or networkidle
scroll Scroll the page to trigger lazy-loading or infinite scroll
block_resources Skip images, fonts, or other resources for faster responses
stealth Enable more browser-like anti-bot behavior
proxy Use proxy routing or geo-targeting
headers / cookies Forward request context to the target page
session_name Reuse a named session for cookie persistence

JS-rendered example

{
  "url": "https://quotes.toscrape.com/js/",
  "render_js": true,
  "wait_until": "networkidle",
  "block_resources": ["image", "font"],
  "stealth": true,
  "output": "text",
  "extract": {
    "selectors": {
      "quotes": { "selector": ".quote", "multiple": true }
    }
  }
}

When to turn rendering on

Enable render_js when:

  • the response is empty or missing expected content
  • the page is a React/Vue/Angular SPA
  • key fields appear only after browser execution
  • the site needs scrolling, delays, or browser-like behavior

Leave render_js off when:

  • the data is already present in server-rendered HTML
  • you want lower latency and lower resource use
  • you are scraping simple content pages, docs, or blogs

Practical advice

  1. Start simple with render_js: false
  2. If content is missing, enable render_js
  3. Add wait_for only if a specific selector is slow to appear
  4. Use block_resources to speed things up
  5. Add proxy or stealth only for tougher sites