Useful ToolsBrowserAutomation
DocsUseful ToolsBrowserAutomation

BrowserAutomation

Natural language browser automation via Playwright. Navigate, click, type, screenshot — describe what you want, no CSS selectors needed.

Log in once, sessions persist in ~/.co/browser_profile/. Uses a vision LLM to find elements by description.

Installation

code
pip install playwright playwright install chromium

Quick Start

With an agent

main.py
from connectonion import Agent from connectonion.useful_tools.browser_tools import BrowserAutomation browser = BrowserAutomation() agent = Agent("web", tools=[browser], model="co/gemini-2.5-pro") agent.input("go to news.ycombinator.com and get the top 5 story titles")

Direct usage

main.py
from connectonion.useful_tools.browser_tools import BrowserAutomation with BrowserAutomation() as browser: browser.go_to("https://example.com") browser.click("the contact button") browser.keyboard_type("hello@example.com") browser.keyboard_press("Enter") browser.take_screenshot("result.png")

API Reference

Navigation

  • go_to(url)
  • get_current_url()
  • get_text()
  • get_links_from_page(filter?)

Interaction

  • click(description)
  • hover(description)
  • mouse_click(x, y)
  • right_click(description)
  • double_click(description)
  • keyboard_type(text)
  • keyboard_press(key)
  • scroll(times?, description?)

Screenshot

  • take_screenshot(path?, full_page?)
  • set_viewport(width, height)

Waiting

  • wait(seconds)
  • wait_for_element(description)
  • wait_for_text(text)
  • wait_for_manual_login(site)

Forms

  • select_option(field, option)
  • check_checkbox(description, checked?)
  • upload_file_by_selector(selector, file_path)
  • upload_file_after_click_by_selector(selector, file_path)

Persistent Sessions

Log in once — cookies and sessions persist to ~/.co/browser_profile/ automatically:

main.py
# First run — log in manually browser = BrowserAutomation() browser.go_to("https://x.com") browser.wait_for_manual_login("X.com") # You handle 2FA/CAPTCHA # Session saved automatically # Every run after — already logged in browser = BrowserAutomation() browser.go_to("https://x.com") # Session restored

Screenshots

main.py
# Returns base64 image (saved to .tmp/ automatically) browser.take_screenshot() # Custom filename browser.take_screenshot("login_page.png") # Full page capture browser.take_screenshot(full_page=True) # Headless vs visible BrowserAutomation(headless=False) # Default — opens visible window BrowserAutomation(headless=True) # Runs in background (faster, no window)

Hover & Advanced Mouse

Reveal hover menus, click exact pixel coordinates, or open context menus:

main.py
browser.hover("the Like button") # Hover to reveal menus/tooltips browser.take_screenshot() # See what appeared browser.mouse_click(x, y) # Click exact coordinates (for hover menus) browser.right_click("the file icon") # Open context menu browser.double_click("the file name") # Double-click to open/select
mouse_click(x, y) is useful after hover() — clicking by description would re-scan the DOM and dismiss the hover menu.

System Info

Call get_system_info() before using keyboard shortcuts to get the correct modifier key for the current OS:

main.py
info = browser.get_system_info() # → "OS: macOS. Use Meta for shortcuts (Meta+a select all, Meta+c copy...)" # → "OS: Windows. Use Control for shortcuts..."

Typing

main.py
browser.click("the email input") browser.keyboard_type("user@example.com") browser.keyboard_press("Enter") browser.keyboard_press("Control+Enter") browser.keyboard_press("Escape") browser.keyboard_press("Tab")

After keyboard_type(), call take_screenshot() to verify the text landed in the right field.

Scrolling

main.py
browser.scroll() # 5 scrolls on main content browser.scroll(times=3, description="the sidebar") # Scroll a specific area

Uses AI to pick the best scroll strategy (element scroll, page scroll, or mouse wheel).

Reading Page Content

main.py
browser.get_text() # All visible text from the page browser.get_links_from_page() # All unique URLs browser.get_links_from_page("github.com") # URLs containing "github.com"

Forms

main.py
browser.select_option("country dropdown", "Australia") browser.check_checkbox("I agree to terms") browser.check_checkbox("newsletter", checked=False) # Uncheck

File Uploads

main.py
# Upload to an existing file input. Hidden inputs are supported. browser.upload_file_by_selector('input[type="file"]', "cover.png") # Click an upload button that opens the OS file picker, then attach the file. browser.upload_file_after_click_by_selector( "button", "cover.png", text="Upload from computer", )

Both upload helpers accept frame_url_contains and frame_name for upload controls inside iframes. Pass index when a selector matches multiple controls.

Waiting

main.py
browser.wait(2) # Wait 2 seconds browser.wait_for_element("the save button") # Wait for element to appear browser.wait_for_text("Payment successful") # Wait for text on page browser.wait_for_manual_login("Gmail") # Pause for 2FA/CAPTCHA

Viewport

main.py
browser.set_viewport(1920, 1080) browser.set_viewport(375, 812) # iPhone

Use with Agent

main.py
from connectonion import Agent from connectonion.useful_tools.browser_tools import BrowserAutomation browser = BrowserAutomation(headless=False) # Visible for debugging agent = Agent("scraper", tools=[browser], model="co/gemini-2.5-pro") agent.input("Go to news.ycombinator.com, get the top 5 story titles") agent.input("Navigate to github.com/trending and screenshot the page") agent.input("Fill in the contact form on example.com with test data")

Common Patterns

Login once, reuse session

main.py
browser = BrowserAutomation() browser.go_to("https://app.example.com/login") browser.wait_for_manual_login("example.com") # Log in once # Every run after: session is restored from ~/.co/browser_profile/

Screenshot workflow

main.py
browser.go_to("https://example.com") browser.click("Login") browser.keyboard_type("user@example.com") browser.keyboard_press("Tab") browser.keyboard_type("password123") browser.take_screenshot("before_submit.png") browser.keyboard_press("Enter") browser.wait(2) browser.take_screenshot("after_login.png")

Data extraction

main.py
browser.go_to("https://example.com/products") text = browser.get_text() links = browser.get_links_from_page("/product/")

Notes

  • Uses Google Chrome if installed (better site compatibility), otherwise falls back to Chromium
  • Viewport defaults to 1920×1200 for maximum content visibility
  • Output is truncated when used as an agent tool to prevent token overflow
  • Windows is not supported

Star us on GitHub

If ConnectOnion saves you time, a ⭐ goes a long way — and earns you a coffee chat with our founder.