Show HN: Index – New Open Source browser agent

github.com

55 points by skull8888888 12 hours ago

Hey HN, Robert from Laminar (lmnr.ai) here.

We built Index - new SOTA Open Source browser agent.

It reached 92% on WebVoyager with Claude 3.7 (extended thinking). o1 was used as a judge, also we manually double checked the judge.

At the core is same old idea - run simple JS script in the browser to identify interactable elements -> draw bounding boxes around them on a screenshot of a browser window -> feed it to the LLM.

What made Index so good:

1. We essentially created browser agent observability. We patched Playwright to record the entire browser session while the agent operates, simultaneously tracing all agent steps and LLM calls. Then we synchronized everything in the UI, creating an unparalleled debugging experience. This allowed us to pinpoint exactly where the agent fails by seeing what it "sees" in session replay alongside execution traces.

2. Our detection script is simple but extremely good. It's carefully crafted via trial and error. We also employed CV and OCR.

3. Agent is very simple, literally just a while loop. All power comes from carefully crafted prompt and ton of eval runs.

Index is a simple python package. It also comes with a beautiful CLI.

pip install lmnr-index

playwright install chromium

index run

We've recently added o4-mini, Gemini 2.5 Pro and Flash. Pro is extremely good and fast. Give it a try via CLI.

You can also use index via serverless API. (https://docs.lmnr.ai/index-agent/api/getting-started)

Or via chat UI - https://lmnr.ai/chat.

To learn more about browser agent observability and evals check out open-source repo (https://github.com/lmnr-ai/lmnr) and our docs (https://docs.lmnr.ai/tracing/browser-agent-observability).

androng 5 hours ago

Can it actually do something difficult like apply for jobs? So far I know of five or so websites that claim they can apply to jobs for you like sonara.ai and usemassive.com and Skyvern AI but when you try to actually use them all they can do is the one-page job applications and not the much more common Workday 10-page job applications with annoying "create an account" and annoying questions like "Do you have any relatives that work at Sony" and annoying "fill out all your work experience" where you have to click 50 times for one application. That's like half of all job applications. https://jobs.spectrum.com/job/-/-/4673/76746020384?utm_sourc...

  • skull8888888 5 hours ago

    I'm pretty confident it can do it. Try it out and see for yourself. Just install the package, run cli and give it your prompt.

    pip install lmnr-index playwright install chromium index run

    Also try experimenting with different models. So far, Gemini 2.5 Pro is the best in terms of quality/speed. Claude 3.7 is also pretty good.

    • naim08 5 hours ago

      shit, let me try it out

  • rushils 5 hours ago

    while we don't auto-apply to jobs for you, our browser extension, Simplify Copilot, makes it easier to apply to those multi-step application forms (workday, taleo, sap, etc.)

    https://simplify.jobs/install

    • skull8888888 5 hours ago

      any need for browser agent observability?

  • esafak 5 hours ago

    I consider using bad hiring software like that a red flag, and suggestive of other things the company must be doing wrong too. I noped out whenever I saw Taleo.

    • globalnode an hour ago

      All big successful companies do "something" wrong, thats how they make money. Steal your OSS, not pay taxes, avoid overtime payments, low wages, outsource slavery, destroy the environment, gaslight while they steal your data and subject millions to dark patterns of advertising and marketing, screw over suppliers, intentionally sew discord as a distraction. The list goes on. To me the bigger the company the bigger the red flag.

noleary 6 hours ago

> Index is the SOTA open-source browser agent for autonomously executing complex tasks on the web.

I've written a handful of pretty hacky Python scripts that just pull down all of the HTML content from a page and toss it over to OpenAI. As you can imagine, these were all extremely simple tasks, e.g., "find out if there's a login button"

What's a good example of a complex task that Index is well-suited for? What's the threshold of minimal complexity where you guys are a really good fit?

  • skull8888888 5 hours ago

    - research task, agent is smart enough to understand which links to click next without the need to hardcode the parsing and navigation logic

    - any task that requires UI interaction, button clicking, filter selection, form filling and so on. Just prompt it, it's surprisingly very robust and self-healing.

    - complex long-running task that require extensive context - e.g. researching one topic and then creating spreadsheet, creating a presentation for a topic and so on.

    Essentially, any task that can be done within a browser environment that previously required flacky hardcoded predefined scripts. Also, website testing is a great example.

    • nico 3 hours ago

      Would love to see it doing some work on a Google spreadsheet (including doing formulas, vlookups, data import and cleanup) and then creating a decent Slides presentation with some charts from the spreadsheet

shekhar101 5 hours ago

Can you open up the options to use other model/versions, especially Gemini-2.5 pro experimental models available through aistudio? Would love to try this but gemini flash fails for even simple tasks. Example: I asked it to extract all the links from comment section of a hackernews comment section and it just scrolled all the way to the end and then nothing. Maybe pro models can do it better.

  • Yiling-J 4 hours ago

    "Gemini Flash fails even for simple tasks." On the Gemini Flash page (https://deepmind.google/technologies/gemini/flash/), it claims to be 'best for fast performance on complex tasks.'. I always use Gemini Flash in my project for demos and testing, and it performs very well, if a project requires a large, expensive model to handle simple tasks, that could be an issue to users.

  • skull8888888 5 hours ago

    Gemini 2.5 pro is available. Is it missing on your side? Do you run index via CLI?

    • shekhar101 5 hours ago

      Yes it is, however API keys from aistudio only allows pro-experimental model. So if I select gemini-pro, I will see this: "Gemini 2.5 Pro Preview doesn't have a free quota tier. Please use Gemini 2.5 Pro Experimental (models/gemini-2.5-pro-exp-03-25) instead". Can I choose exact model somewhere in the CLI?

      • skull8888888 5 hours ago

        Oh I see, didn't know about that, fastest and easiest thing you can do is to play around with pro via our chat UI https://lmnr.ai/chat - it's free up to 10 messages.

        For the CLI and custom models, you can clone the repo, then go to the cli.py and manually add your model there. I will work on proper support of custom models.

        • naim08 5 hours ago

          extremely slow

          • skull8888888 5 hours ago

            which model are you using? try gemini pro/flash, they are very fast

xena 5 hours ago

How do I block it from my services? Does it obey robots.txt?

  • CaptainFever an hour ago

    "That's the neat part, you don't."

    If I wanted to use this to do my personal browsing for me, like checking for website updates on those where RSS does not exist, you shouldn't be able to stop me.