DocsWebFetch

WebFetch

Give your agents web scraping powers. Fetch, parse, and analyze web pages.

Quick Start

quickstart.py
python
from connectonion import Agent, WebFetch

web = WebFetch()
agent = Agent("researcher", tools=[web])

agent.input("What does stripe.com do?")
agent.input("Get contact info from acme.com")

Usage

Option 1: Import directly

import_direct.py
python
from connectonion import WebFetch

agent = Agent("researcher", tools=[WebFetch()])

Option 2: Copy and customize

Terminalbash

$co copy web_fetch

import_local.py
python
from tools.web_fetch import WebFetch  # Your local copy

Installation

install.py
python
from connectonion import WebFetch

web = WebFetch()

Low-Level Methods

Direct HTTP and parsing operations

fetch(url)

HTTP GET request, returns raw HTML

fetch.py
python
html = web.fetch("example.com")
# Returns: "<!DOCTYPE html>..."

strip_tags(html)

Strip HTML tags, returns body text only

strip_tags.py
python
text = web.strip_tags(html)
# Returns clean text without HTML

get_title(html)

Get page title

get_title.py
python
title = web.get_title(html)
# Returns: "Example Domain"

get_links(html)

Extract all links from HTML

get_links.py
python
links = web.get_links(html)
# Returns: [{'text': 'Home', 'href': '/'}, {'text': 'About', 'href': '/about'}]

get_emails(html)

Extract email addresses from HTML

get_emails.py
python
emails = web.get_emails(html)
# Returns: ['support@example.com', 'sales@company.org']

get_social_links(html)

Extract social media links

get_social_links.py
python
html = web.fetch("openai.com")
social = web.get_social_links(html)
# Returns: {'twitter': 'https://x.com/OpenAI', 'youtube': '...', 'github': '...'}

High-Level Methods (LLM-Powered)

AI-powered analysis of web pages

analyze_page(url)

Use LLM to understand what a page/company does

analyze_page.py
python
result = web.analyze_page("stripe.com")
# Returns: "Stripe is a technology company that builds economic infrastructure..."

get_contact_info(url)

Extract contact information using LLM

get_contact_info.py
python
result = web.get_contact_info("stripe.com/contact")
# Returns: "Email: support@stripe.com, Phone: ..."

Composing Functions

Chain low-level methods together for custom workflows:

compose.py
python
# Get clean text from a URL
text = web.strip_tags(web.fetch("example.com"))

# Get title and text
html = web.fetch("example.com")
title = web.get_title(html)
text = web.strip_tags(html)

Research Agent Example

research_agent.py
python
from connectonion import Agent, WebFetch, Memory

web = WebFetch()
memory = Memory()

agent = Agent(
    name="researcher",
    tools=[web, memory],
    system_prompt="""You are a web researcher. You can:
    - Fetch and analyze websites
    - Extract contact information
    - Find social media profiles
    - Remember findings for later"""
)

# Research a company
agent.input("Research stripe.com and tell me what they do")

# Find contact info
agent.input("Get contact information from acme.com")

# Build a lead list
agent.input("Find all email addresses on techstartup.io and save them to memory")

# Competitive analysis
agent.input("Compare what stripe.com and square.com offer")

API Reference

Method	Type	Description
fetch(url)	Low-level	HTTP GET, returns raw HTML
strip_tags(html)	Low-level	Remove HTML tags, return text
get_title(html)	Low-level	Extract page title
get_links(html)	Low-level	Extract all links
get_emails(html)	Low-level	Extract email addresses
get_social_links(html)	Low-level	Extract social media links
analyze_page(url)	LLM	AI analysis of what page/company does
get_contact_info(url)	LLM	AI extraction of contact info

Configuration

config.py
python
# Custom timeout (default: 15 seconds)
web = WebFetch(timeout=30)

# Use with agent
agent = Agent("researcher", tools=[web])

Customizing

Need to modify WebFetch's behavior? Copy the source into your project and import from there: