DocsTranscribe

Audio Transcription

Convert audio files to text using Gemini's multimodal capabilities. Simple one-function interface for transcription.

Quick Start

main.py
python
from connectonion import transcribe

# Simple transcription (uses OpenOnion managed keys)
text = transcribe("meeting.mp3")
print(text)

# With your own Gemini API key
text = transcribe("meeting.mp3", model="gemini-3-flash-preview")
output
>>> text = transcribe("meeting.mp3")
>>> print(text)
All right, so here we are in front of the elephants...

That's it! One function for audio-to-text.

With Context Hints

Improve accuracy for domain-specific terms:

main.py
python
# Technical meeting with specific names
text = transcribe(
    "standup.mp3",
    prompt="Technical AI discussion. Names: Aaron, Lisa. Terms: ConnectOnion, OpenOnion"
)

# Medical transcription
text = transcribe(
    "consultation.mp3",
    prompt="Medical consultation. Terms: hypertension, metformin, CBC"
)
output
>>> text = transcribe("standup.mp3", prompt="Technical AI discussion...")
>>> print(text)
Aaron mentioned that ConnectOnion's new feature is ready for review...

With Timestamps

main.py
python
text = transcribe("podcast.mp3", timestamps=True)
print(text)
output
>>> text = transcribe("podcast.mp3", timestamps=True)
>>> print(text)
[00:00] Welcome to the show...
[00:15] Today we're discussing AI agents...
[01:30] Let's dive into the first topic...

Real Examples

Meeting Minutes

main.py
python
def get_meeting_minutes(audio_path: str) -> str:
    """Transcribe and summarize a meeting."""
    from connectonion import transcribe, llm_do

    # Step 1: Transcribe
    transcript = transcribe(audio_path, prompt="Business meeting")

    # Step 2: Summarize
    summary = llm_do(
        transcript,
        system_prompt="Extract action items and key decisions as bullet points."
    )
    return summary
output
>>> summary = get_meeting_minutes("standup.mp3")
>>> print(summary)
**Action Items:**
- Aaron to review PR #123
- Lisa to update documentation
 
**Key Decisions:**
- Launch date set for Friday

Voice Notes Processing

main.py
python
from pathlib import Path

def process_voice_notes(folder: str) -> list[str]:
    """Transcribe all voice notes in a folder."""
    from connectonion import transcribe

    results = []
    for audio in Path(folder).glob("*.mp3"):
        text = transcribe(str(audio))
        results.append(f"# {audio.stem}\n{text}")
    return results
output
>>> notes = process_voice_notes("voice_notes/")
>>> print(notes[0])
# idea_2024_01
Remember to add the new transcribe feature to the docs...

Use as Agent Tool

main.py
python
from connectonion import Agent, transcribe

def transcribe_audio(file_path: str) -> str:
    """Transcribe an audio file to text."""
    return transcribe(file_path)

agent = Agent("assistant", tools=[transcribe_audio])
result = agent.input("Transcribe the file meeting.mp3 and summarize it")
output
>>> result = agent.input("Transcribe meeting.mp3 and summarize it")
>>> print(result)
I've transcribed the meeting. Here's a summary:
The team discussed the Q4 roadmap and agreed to...

Parameters

Parameter	Type	Default	Description
`audio`	str	required	Path to audio file
`prompt`	str	None	Context hints for accuracy
`model`	str	"co/gemini-3-flash-preview"	Model to use
`timestamps`	bool	False	Include timestamps in output

Supported Formats

WAVMP3AIFFAACOGGFLACM4AWebM

Token cost: 32 tokens per second of audio (1 minute = 1,920 tokens)

Models

main.py
python
# OpenOnion managed keys (default - no API key needed)
transcribe("audio.mp3", model="co/gemini-3-flash-preview")
transcribe("audio.mp3", model="co/gemini-2.5-flash")

# Your own Gemini API key (set GEMINI_API_KEY)
transcribe("audio.mp3", model="gemini-3-flash-preview")
transcribe("audio.mp3", model="gemini-2.5-flash")
output
>>> transcribe("audio.mp3", model="co/gemini-3-flash-preview")
'This is the transcribed text from your audio file...'
 
>>> transcribe("audio.mp3", model="gemini-2.5-flash")
'This is the transcribed text using your own API key...'

What You Get

Simple API - One function for all transcription needs

Context hints - Improve accuracy with domain terms

Multiple formats - WAV, MP3, FLAC, and more

Timestamps - Optional time markers in output

Managed keys - Works out of the box with co/ models

Comparison with Agent

Feature	transcribe()	Agent()
Purpose	Audio to text	Multi-step workflows
Input	Audio files	Text prompts
Output	Plain text	Agent responses
Best for	Transcription	Complex tasks

main.py
python
# Use transcribe() for audio-to-text
text = transcribe("meeting.mp3")

# Use Agent for complex workflows with multiple tools
agent = Agent("assistant", tools=[search, calculate])
result = agent.input("Research and analyze...")
output
>>> text = transcribe("meeting.mp3")
>>> print(text[:50])
'All right, so here we are in front of the elepha...'
 
>>> result = agent.input("Research and analyze...")
>>> print(result)
I'll help you research and analyze...

Error Handling

main.py
python
from connectonion import transcribe

try:
    text = transcribe("nonexistent.mp3")
except FileNotFoundError:
    print("Audio file not found")
except ValueError as e:
    print(f"API error: {e}")
output
>>> try:
...     text = transcribe("nonexistent.mp3")
... except FileNotFoundError:
...     print("Audio file not found")
Audio file not found