Stop Making Your AI Agent Click Buttons

If you are using an LLM to automate browser tasks, you have probably watched it squint at screenshots and click buttons one at a time. Open a site, take a snapshot, reason about what is on screen, click something, take another snapshot, reason again. 40 search results means 40 rounds of this. It is slow, it burns tokens, and it breaks constantly because popups, loading spinners, unexpected modals, and infinite scrolls all throw it off.

I was doing exactly this with OpenClaw until I realized the agent wasn’t the problem. The whole approach was wrong.

Look at the data layer instead

Every web app is just a frontend calling API endpoints. LinkedIn has Voyager. Kijiji has Apollo GraphQL. Most Next.js sites embed their entire data model in a __NEXT_DATA__ script tag.

So instead of having the agent navigate pages and interpret what it sees, I started writing small JS adapters that call the site’s data layer directly and injecting them into the browser tab via evaluate().

Here’s what the LinkedIn one looks like:

var csrf = document.cookie.match(/JSESSIONID="?([^";]+)/)[1];

var data = await fetch('/voyager/api/voyagerJobsDashJobCards?count=25&q=jobSearch&query=...', {
  headers: {
    'csrf-token': csrf,
    'x-restli-protocol-version': '2.0.0'
  }
}).then(r => r.json());

// data.included → 25 structured job results, instantly

Because the code runs inside the browser tab, it inherits the user’s session cookies. If you are logged into LinkedIn, the adapter is already authenticated. No OAuth setup, no credential storage, auth is just solved for free.

I went from 20+ minutes and hundreds of thousands of tokens clicking through UI to 80+ jobs with full descriptions in under a minute.

Same idea, different site

Kijiji (Canada’s Craigslist) is a Next.js app. There aren’t obvious API calls to intercept, but Next.js embeds all its data into the page for client hydration:

var el = document.getElementById('__NEXT_DATA__');
var d = JSON.parse(el.textContent);
var apollo = d.props.pageProps.__APOLLO_STATE__;

Object.keys(apollo).forEach(function(k) {
  if (k.indexOf('StandardListing:') === 0) {
    var listing = apollo[k];
    // title, price, location, seller type, images - all structured
  }
});

One evaluate call, 40 listings, fully structured. The data was sitting right there in the page and the agent was spending tokens trying to read it off the screen.

So there are basically two patterns here: API adapters (like LinkedIn) where you call internal endpoints directly, and SSR adapters (like Kijiji) where you parse the data the server already embedded. The recon process is the same for both. Check for embedded data first, then look at network requests for API calls. DOM scraping is the last resort.

The agent writes the script, then gets out of the way

Here’s the thing I kept overcomplicating: you don’t need an AI agent to extract data. You need a script. The agent is useful for the investigation phase, stuff like “look at the network requests on this page, find the API endpoints, check for __NEXT_DATA__.” And then it writes a plain script. After that, the agent just runs it.

Each adapter ends up as a self-contained JS function:

(function(){
  window.KJ = {
    search: function() { /* parse data, return structured results */ },
    details: function(id) { /* full details for one item */ },
    results: function() { /* export everything accumulated */ }
  };
  return 'KJ ready';
})()

Inject it into the browser tab, call methods, get JSON back. No LLM in the loop during extraction.

I documented the process as a checklist that the agent follows:

Check for embedded data (__NEXT_DATA__, __INITIAL_STATE__) first
If not found, intercept network calls for API endpoints
Write a JS adapter that extracts the data directly
Build a pipeline (dedup, filter, store)

The agent follows these steps, writes the adapter, and from then on it is just running code.

Running it from a sandbox

If your agent runs in a Docker sandbox but the browser is on the host, you’ll hit a problem. CORS blocks direct communication from the page to the sandbox, and downloads go to the host filesystem, not the container.

What worked for me was connecting directly to Chrome’s DevTools Protocol via WebSocket from inside the sandbox. A Bun script connects to ws://host.docker.internal:{port}, navigates pages, runs evaluate, and pulls results back directly. No agent needed during extraction.

The whole thing is a single Bun script. One command to search, dedup, filter, fetch details, and save structured JSON files. There’s a pipeline behind it that tracks what’s already been processed, so re-runs only touch new listings. I’ll write more about the full workflow in a future post.

What is still messy

Sites mix patterns. Kijiji SSR-renders the first page into __NEXT_DATA__ but paginates via client-side GraphQL. So the embedded data is useful for understanding the data model, but the real extraction sometimes needs to go deeper into the client-side API calls.

Self-reported metadata is unreliable. Marketplace sites let sellers categorize themselves, and half the “private owners” are actually businesses. Real filtering needs to look at the actual text, not the checkboxes.

And adapters break when sites change. The rebuild is fast when you already know the pattern, but it is still maintenance.

Wrapping up

The main idea: don’t put the LLM in a loop clicking through pages to extract data. Use it to investigate the site, figure out how the data layer works, and write an adapter script. Then run that script directly. The agent does the hard thinking once, and after that it is just code.