DocsUseful ToolsBrowserAutomation

BrowserAutomation

Natural language browser automation via Playwright. Navigate, click, type, screenshot — describe what you want, no CSS selectors needed.

Download

Installation

code

bash

pip install playwright
playwright install chromium

Quick Start

With an agent

main.py
python
from connectonion import Agent
from connectonion.useful_tools.browser_tools import BrowserAutomation

browser = BrowserAutomation()
agent = Agent("web", tools=[browser], model="co/gemini-2.5-pro")

agent.input("go to news.ycombinator.com and get the top 5 story titles")

Direct usage

main.py
python
from connectonion.useful_tools.browser_tools import BrowserAutomation

with BrowserAutomation() as browser:
    browser.go_to("https://example.com")
    browser.click("the contact button")
    browser.keyboard_type("hello@example.com")
    browser.keyboard_press("Enter")
    browser.take_screenshot("result.png")

API Reference

Navigation

go_to(url)
get_current_url()
get_text()
get_links_from_page(filter?)

Interaction

click(description)
hover(description)
mouse_click(x, y)
right_click(description)
double_click(description)
keyboard_type(text)
keyboard_press(key)
scroll(times?, description?)

Screenshot

take_screenshot(path?, full_page?)
set_viewport(width, height)

Waiting

wait(seconds)
wait_for_element(description)
wait_for_text(text)
wait_for_manual_login(site)

Forms

select_option(field, option)
check_checkbox(description, checked?)
upload_file_by_selector(selector, file_path)
upload_file_after_click_by_selector(selector, file_path)

Persistent Sessions

main.py
python
# First run — log in manually
browser = BrowserAutomation()
browser.go_to("https://x.com")
browser.wait_for_manual_login("X.com")  # You handle 2FA/CAPTCHA
# Session saved automatically

# Every run after — already logged in
browser = BrowserAutomation()
browser.go_to("https://x.com")  # Session restored

Screenshots

main.py
python
# Returns base64 image (saved to .tmp/ automatically)
browser.take_screenshot()

# Custom filename
browser.take_screenshot("login_page.png")

# Full page capture
browser.take_screenshot(full_page=True)

# Headless vs visible
BrowserAutomation(headless=False)  # Default — opens visible window
BrowserAutomation(headless=True)   # Runs in background (faster, no window)

Hover & Advanced Mouse

Reveal hover menus, click exact pixel coordinates, or open context menus:

main.py
python
browser.hover("the Like button")         # Hover to reveal menus/tooltips
browser.take_screenshot()                # See what appeared
browser.mouse_click(x, y)                # Click exact coordinates (for hover menus)

browser.right_click("the file icon")     # Open context menu
browser.double_click("the file name")    # Double-click to open/select

mouse_click(x, y) is useful after hover() — clicking by description would re-scan the DOM and dismiss the hover menu.

System Info

Call get_system_info() before using keyboard shortcuts to get the correct modifier key for the current OS:

main.py
python
info = browser.get_system_info()
# → "OS: macOS. Use Meta for shortcuts (Meta+a select all, Meta+c copy...)"
# → "OS: Windows. Use Control for shortcuts..."

Typing

main.py
python
browser.click("the email input")
browser.keyboard_type("user@example.com")

browser.keyboard_press("Enter")
browser.keyboard_press("Control+Enter")
browser.keyboard_press("Escape")
browser.keyboard_press("Tab")

After keyboard_type(), call take_screenshot() to verify the text landed in the right field.

Scrolling

main.py
python
browser.scroll()                                     # 5 scrolls on main content
browser.scroll(times=3, description="the sidebar")  # Scroll a specific area

Uses AI to pick the best scroll strategy (element scroll, page scroll, or mouse wheel).

Reading Page Content

main.py
python
browser.get_text()                           # All visible text from the page
browser.get_links_from_page()                # All unique URLs
browser.get_links_from_page("github.com")   # URLs containing "github.com"

Forms

main.py
python
browser.select_option("country dropdown", "Australia")
browser.check_checkbox("I agree to terms")
browser.check_checkbox("newsletter", checked=False)  # Uncheck

File Uploads

main.py
python
# Upload to an existing file input. Hidden inputs are supported.
browser.upload_file_by_selector('input[type="file"]', "cover.png")

# Click an upload button that opens the OS file picker, then attach the file.
browser.upload_file_after_click_by_selector(
    "button",
    "cover.png",
    text="Upload from computer",
)

Both upload helpers accept frame_url_contains and frame_name for upload controls inside iframes. Pass index when a selector matches multiple controls.

Waiting

main.py
python
browser.wait(2)                              # Wait 2 seconds
browser.wait_for_element("the save button") # Wait for element to appear
browser.wait_for_text("Payment successful") # Wait for text on page
browser.wait_for_manual_login("Gmail")      # Pause for 2FA/CAPTCHA

Viewport

main.py
python
browser.set_viewport(1920, 1080)
browser.set_viewport(375, 812)   # iPhone

Use with Agent

main.py
python
from connectonion import Agent
from connectonion.useful_tools.browser_tools import BrowserAutomation

browser = BrowserAutomation(headless=False)  # Visible for debugging
agent = Agent("scraper", tools=[browser], model="co/gemini-2.5-pro")

agent.input("Go to news.ycombinator.com, get the top 5 story titles")
agent.input("Navigate to github.com/trending and screenshot the page")
agent.input("Fill in the contact form on example.com with test data")

Common Patterns

Login once, reuse session

main.py
python
browser = BrowserAutomation()
browser.go_to("https://app.example.com/login")
browser.wait_for_manual_login("example.com")  # Log in once

# Every run after: session is restored from ~/.co/browser_profile/

Screenshot workflow

main.py
python
browser.go_to("https://example.com")
browser.click("Login")
browser.keyboard_type("user@example.com")
browser.keyboard_press("Tab")
browser.keyboard_type("password123")
browser.take_screenshot("before_submit.png")
browser.keyboard_press("Enter")
browser.wait(2)
browser.take_screenshot("after_login.png")

Data extraction

main.py
python
browser.go_to("https://example.com/products")
text = browser.get_text()
links = browser.get_links_from_page("/product/")

Notes

•Uses Google Chrome if installed (better site compatibility), otherwise falls back to Chromium
•Viewport defaults to 1920×1200 for maximum content visibility
•Output is truncated when used as an agent tool to prevent token overflow
•Windows is not supported

BrowserAutomation

Installation

Quick Start

With an agent

Direct usage

API Reference

Navigation

Interaction

Screenshot

Waiting

Forms

Persistent Sessions

Screenshots

Hover & Advanced Mouse

System Info

Typing

Scrolling

Reading Page Content

Forms

File Uploads

Waiting

Viewport

Use with Agent

Common Patterns

Login once, reuse session

Screenshot workflow

Data extraction

Notes

Star us on GitHub