Python SDK examples for ScrapingBee users

Python SDK examples for ScrapingBee users

If you are used to ScrapingBee showing Python snippets by default, this guide gives you the nearest Toolkit API equivalent using our Python SDK.

Install the SDK:

pip install toolkitapi

1. Basic page fetch

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.fetch(url="https://toolkitapi.io", output="html")
    print(result["status_code"])
    print(result["content"][:500])

2. JavaScript rendering

ScrapingBee often enables browser rendering by default. In Toolkit API, turn it on explicitly when you need it.

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.render_page(
        url="https://toolkitapi.io/app",
        wait_until="networkidle",
        output="html",
    )
    print(result["js_rendered"])
    print(result["content"][:500])

3. Wait for a selector

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.render_page(
        url="https://toolkitapi.io/product/123",
        wait_for=".price",
        wait_timeout=15000,
        output="clean",
    )
    print(result["content"])

4. Wait for browser state

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.render_page(
        url="https://toolkitapi.io/dashboard",
        wait_until="load",
        output="text",
    )
    print(result["content"])

Use one of:

  • load
  • domcontentloaded
  • networkidle

5. Wait for a fixed amount of time

For scrape responses, the preferred pattern is waiting for a selector or browser state rather than sleeping blindly. If you need an actual render delay for a visual capture, use the Screenshot SDK:

from toolkitapi import Screenshot

with Screenshot(api_key="tk_...") as shot:
    png = shot.capture(
        url="https://toolkitapi.io",
        delay=3000,
        format="png",
    )

6. Block images, fonts, or stylesheets

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.render_page(
        url="https://toolkitapi.io/news",
        output="markdown",
        block_resources=["image", "stylesheet", "font"],
    )
    print(result["content"])

7. Remove clutter and ad-like noise

Toolkit API does not expose a separate ad-block toggle in the scrape SDK. The closest equivalent is clean extraction plus resource blocking.

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.clean_content(
        url="https://toolkitapi.io/article",
        output="clean",
        remove=[".promo", ".newsletter-box", ".sticky-banner"],
    )
    print(result["content"])

8. Return Markdown content

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.extract_markdown(url="https://toolkitapi.io/blog/post")
    print(result["content"])

9. Return plain text content

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.page_text(url="https://toolkitapi.io/blog/post")
    print(result["content"])

10. JSON response by default

Unlike services that need a special JSON wrapper flag, Toolkit API already returns structured JSON.

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.fetch(
        url="https://toolkitapi.io",
        output="markdown",
        extract={"meta_tags": True, "links": True},
    )

    print(result.keys())
    print(result.get("meta_tags"))
    print(result.get("links"))

11. Return source HTML

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.fetch(
        url="https://toolkitapi.io",
        output="html",
        render_js=False,
    )
    print(result["content"])

If you need both rendered and unrendered views, make one request with render_js=False and a second one with render_js=True.


12. CSS selector extraction

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.css_extract(
        url="https://toolkitapi.io/product/123",
        render_js=True,
        selectors={
            "title": "h1",
            "price": ".price",
            "buy_link": {
                "selector": ".buy-now",
                "attr": "href",
            },
        },
    )

    print(result.get("selectors"))

13. AI extraction with a schema

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.ai_extract(
        url="https://toolkitapi.io/product/123",
        render_js=True,
        prompt="Extract the product details shown on the page.",
        schema={
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "string"},
                "availability": {"type": "string"},
            },
        },
    )

    print(result.get("ai_extract"))

14. Article extraction

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.extract_article(url="https://toolkitapi.io/blog/post")
    print(result.get("article"))
    print(result["content"])

15. Metadata extraction

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    meta = scrape.get_meta_tags("https://toolkitapi.io")
    preview = scrape.link_preview("https://toolkitapi.io")
    links = scrape.get_links("https://toolkitapi.io")
    images = scrape.get_images("https://toolkitapi.io")

16. Headers and custom cookies

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    result = scrape.fetch(
        url="https://toolkitapi.io/account",
        render_js=True,
        headers={"Accept-Language": "en-GB,en;q=0.9"},
        cookies=[
            {
                "name": "sessionid",
                "value": "abc123",
                "domain": ".toolkitapi.io",
            }
        ],
        extract={"headers": True},
    )

    print(result.get("headers"))

17. Session reuse

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    scrape.render_page(
        url="https://toolkitapi.io/login",
        session_name="shop-session",
    )

    result = scrape.render_page(
        url="https://toolkitapi.io/cart",
        session_name="shop-session",
        output="text",
    )

    print(result["content"])

18. Proxy and geolocation-style usage

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    us_result = scrape.fetch(
        url="https://toolkitapi.io",
        proxy="US",
        output="html",
    )

    dc_result = scrape.fetch(
        url="https://toolkitapi.io",
        proxy="datacenter",
        output="html",
    )

19. Sitemap, robots, and crawl

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    sitemap = scrape.parse_sitemap(
        "https://toolkitapi.io/sitemap.xml",
        limit=100,
        discover_links=True,
    )

    robots = scrape.parse_robots_txt("https://toolkitapi.io")

    crawl_job = scrape.crawl(
        start_url="https://toolkitapi.io/docs",
        max_pages=20,
        max_depth=2,
        output="markdown",
    )

    crawl_result = scrape.get_crawl_job(crawl_job["job_id"])

20. PDF extraction

from toolkitapi import Scrape

with Scrape(api_key="tk_...") as scrape:
    pdf = scrape.pdf_extract(
        url="https://toolkitapi.io/report.pdf",
        pages="1-3",
    )
    print(pdf["text"])

21. Screenshot equivalents

Some ScrapingBee examples are really visual-browser tasks. In Toolkit API, those are better served by the Screenshot SDK.

Full-page screenshot

from toolkitapi import Screenshot

with Screenshot(api_key="tk_...") as shot:
    png = shot.capture(
        url="https://toolkitapi.io",
        full_page=True,
        format="png",
    )

Screenshot a specific selector

from toolkitapi import Screenshot

with Screenshot(api_key="tk_...") as shot:
    image = shot.capture_element(
        url="https://toolkitapi.io",
        selector=".pricing-table",
        format="png",
    )

HTML to PDF

from toolkitapi import Screenshot

with Screenshot(api_key="tk_...") as shot:
    pdf_bytes = shot.capture_pdf(
        url="https://toolkitapi.io/invoice/123",
        page_format="A4",
        print_background=True,
    )

22. Common mapping table

ScrapingBee idea Toolkit API SDK
Basic HTML fetch Scrape.fetch(..., output="html")
JavaScript rendering Scrape.render_page(...)
Wait for selector wait_for=".selector"
Wait for browser load wait_until="load" or wait_until="networkidle"
Markdown output Scrape.extract_markdown(...)
Text output Scrape.page_text(...)
CSS extraction Scrape.css_extract(...)
AI extraction Scrape.ai_extract(...)
Link, image, meta extraction get_links, get_images, get_meta_tags, link_preview
Sitemap / robots parse_sitemap, parse_robots_txt
Crawl crawl and get_crawl_job
Screenshot / PDF rendering Screenshot.capture, capture_element, capture_pdf

23. Features that are not one-to-one today

A few ScrapingBee-specific switches do not have a direct public scrape SDK equivalent yet:

  • pure header forwarding mode
  • custom upstream proxy passthrough
  • target-site POST or PUT forwarding
  • special Google-only scrape toggle
  • explicit transparent-status toggle
  • dedicated scrape usage endpoint in the SDK
  • viewport width and height controls in the Scrape class itself

When you need visual browser configuration such as viewport size, element capture, or PDF rendering, use the Screenshot SDK alongside the Scrape SDK.