BrowserAutomation
Natural language browser automation via Playwright. Navigate, click, type, screenshot — describe what you want, no CSS selectors needed.
Log in once, sessions persist in ~/.co/browser_profile/. Uses a vision LLM to find elements by description.
Installation
Quick Start
With an agent
Direct usage
API Reference
Navigation
- go_to(url)
- get_current_url()
- get_text()
- get_links_from_page(filter?)
Interaction
- click(description)
- hover(description)
- mouse_click(x, y)
- right_click(description)
- double_click(description)
- keyboard_type(text)
- keyboard_press(key)
- scroll(times?, description?)
Screenshot
- take_screenshot(path?, full_page?)
- set_viewport(width, height)
Waiting
- wait(seconds)
- wait_for_element(description)
- wait_for_text(text)
- wait_for_manual_login(site)
Forms
- select_option(field, option)
- check_checkbox(description, checked?)
- upload_file_by_selector(selector, file_path)
- upload_file_after_click_by_selector(selector, file_path)
Persistent Sessions
Log in once — cookies and sessions persist to ~/.co/browser_profile/ automatically:
Screenshots
Hover & Advanced Mouse
Reveal hover menus, click exact pixel coordinates, or open context menus:
mouse_click(x, y) is useful after hover() — clicking by description would re-scan the DOM and dismiss the hover menu.System Info
Call get_system_info() before using keyboard shortcuts to get the correct modifier key for the current OS:
Typing
After keyboard_type(), call take_screenshot() to verify the text landed in the right field.
Scrolling
Uses AI to pick the best scroll strategy (element scroll, page scroll, or mouse wheel).
Reading Page Content
Forms
File Uploads
Both upload helpers accept frame_url_contains and frame_name for upload controls inside iframes. Pass index when a selector matches multiple controls.
Waiting
Viewport
Use with Agent
Common Patterns
Login once, reuse session
Screenshot workflow
Data extraction
Notes
- •Uses Google Chrome if installed (better site compatibility), otherwise falls back to Chromium
- •Viewport defaults to 1920×1200 for maximum content visibility
- •Output is truncated when used as an agent tool to prevent token overflow
- •Windows is not supported
ConnectOnion