Bright Data CLI: real web evidence, always

# a page that blocks robots — the Web Unlocker gets through brightdata scrape "https://shop.example/item/42" \ --country us \ # exit geography -f markdown # a naïve curl here would get a 403 or a CAPTCHA page

function answer(question) { const hit = sh(`brightdata search "${question}" --json`); const page = sh(`brightdata scrape "${hit.topUrl}"`); return quote(page); // the exact live line + its source }

# 0 · discover the live catalog of dataset types brightdata pipelines list # 1–4 · name a dataset + a target → a clean record brightdata pipelines reddit_posts "https://reddit.com/r/…/comments/…" --pretty # write many records straight to a file brightdata pipelines amazon_product_search "wireless earbuds" \ --format csv -o earbuds.csv

{ "title": "Anyone benchmarked the new CLI?", "author": "u/bench_nerd", "upvotes": 1284, "num_comments": 73, "created_utc": "2026-06-12T08:41:00Z", "subreddit": "r/commandline", "url": "https://reddit.com/r/…/abc", "flair": "Discussion" }

search

You don't know the page yet. Run a query, get back candidate links, then go fetch the best one.

brightdata search "loop engineering CLI" \
  --json --pretty

Pros

+Finds sources when you have none.
+Cheap, fast, the natural first step.

Cons

–Returns links, not the page content.
–Usually needs a follow-up scrape.

Pick this when You need to discover which page holds the answer.

scrape

You know the exact URL. Read it now — past any bot-wall — as clean markdown.

brightdata scrape "https://site/pricing" \
  -f markdown

Pros

+Real content from any page.
+Beats CAPTCHAs and 403s.

Cons

–You must already know the URL.
–Prose, not named fields.

Pick this when You have the page and just need what it says.

browser

The content only appears after you click, scroll, or wait. Drive a real session step by step.

brightdata browser "https://app/feed" \
  --interactive --full-page

Pros

+Handles multi-step / logged-in flows.
+Snapshot the live page tree.

Cons

–Heaviest and slowest mode.
–Overkill for a static page.

Pick this when A one-shot scrape can't reach it — it's behind interaction.

pipelines

The source is a known platform. Get back named, typed fields instead of page soup.

brightdata pipelines x_posts \
  "https://x.com/…/status/…" --pretty

Pros

+Clean fields — no parsing.
+40+ platforms; bulk-friendly.

Cons

–Only for supported platforms.
–Useless for an arbitrary page.

Pick this when It's X / Reddit / YouTube / Amazon / LinkedIn / …

## NUNCA (NEVER) - WebSearch/WebFetch -> CLI brightdata # (NEVER the MCP mcp__Bright_Data__*) ## Tools - Web search/scrape: ALWAYS the CLI brightdata (search / scrape / browser / pipelines) — NEVER WebSearch/WebFetch, NEVER the MCP mcp__Bright_Data__*

# the doubt: is 3.2 really the latest? → don't guess, fetch. # 1 · find the canonical source (SERP) brightdata search "acme-lib releases" --json --pretty # → top hit: the package's release page # 2 · read it past any bot-wall (Web Unlocker) brightdata scrape "https://acme.dev/releases" -f markdown # → "Latest: 4.0.1 — released 2026-06-09" # 3 · cache the grounded fact into research.md, with its source # acme-lib latest = 4.0.1 (acme.dev/releases, pulled 2026-06-14) # the claim is now evidence, not memory — and the loop quotes 4.0.1, not 3.2.

Bright Data CLI: real web evidence, always

The big idea

Why a CLI, and why this one

The one rule

The four modes of the toolbox

Mode deep-dive: scrape, end to end

What scrape does

Output formats and async

The command

FAQ

Try it: guess, or get real evidence?

Guessing agent

Evidence agent

Guessing — no fetch, no source

Grounded — fetch first, then quote

A few terms, in plain words

Where it plugs into the loop

Anatomy of a `pipelines` call

What happens on a pipelines call

Under the hood

In one picture

In the code

Access it yourself

Inside a dataset record

A report from scraped & structured data

One model, two views

Why a report, not raw output

Choosing which mode to reach for

search

scrape

browser

pipelines

In the code

Access it yourself

Worked example: one doubt, grounded

Quick check: did the model land?

The big idea

Why a CLI, and why this one

The one rule

The four modes of the toolbox

Mode deep-dive: scrape, end to end

What scrape does

Output formats and async

The command

FAQ

Try it: guess, or get real evidence?

Guessing agent

Evidence agent

Guessing — no fetch, no source

Grounded — fetch first, then quote

A few terms, in plain words

Where it plugs into the loop

Anatomy of a pipelines call

What happens on a pipelines call

Under the hood

In one picture

In the code

Access it yourself

Inside a dataset record

A report from scraped & structured data

One model, two views

Why a report, not raw output

Choosing which mode to reach for

search

scrape

browser

pipelines

In the code

Access it yourself

Worked example: one doubt, grounded

Quick check: did the model land?

Anatomy of a `pipelines` call