FeaturesAudio Transcription
DocsTranscribe

Audio Transcription

Convert audio files to text using Gemini's multimodal capabilities. Simple one-function interface for transcription.

Quick Start

main.py
from connectonion import transcribe # Simple transcription (uses OpenOnion managed keys) text = transcribe("meeting.mp3") print(text) # With your own Gemini API key text = transcribe("meeting.mp3", model="gemini-3-flash-preview")
output
>>> text = transcribe("meeting.mp3")
>>> print(text)
All right, so here we are in front of the elephants...

That's it! One function for audio-to-text.

With Context Hints

Improve accuracy for domain-specific terms:

main.py
# Technical meeting with specific names text = transcribe( "standup.mp3", prompt="Technical AI discussion. Names: Aaron, Lisa. Terms: ConnectOnion, OpenOnion" ) # Medical transcription text = transcribe( "consultation.mp3", prompt="Medical consultation. Terms: hypertension, metformin, CBC" )
output
>>> text = transcribe("standup.mp3", prompt="Technical AI discussion...")
>>> print(text)
Aaron mentioned that ConnectOnion's new feature is ready for review...

With Timestamps

main.py
text = transcribe("podcast.mp3", timestamps=True) print(text)
output
>>> text = transcribe("podcast.mp3", timestamps=True)
>>> print(text)
[00:00] Welcome to the show...
[00:15] Today we're discussing AI agents...
[01:30] Let's dive into the first topic...

Real Examples

Meeting Minutes

main.py
def get_meeting_minutes(audio_path: str) -> str: """Transcribe and summarize a meeting.""" from connectonion import transcribe, llm_do # Step 1: Transcribe transcript = transcribe(audio_path, prompt="Business meeting") # Step 2: Summarize summary = llm_do( transcript, system_prompt="Extract action items and key decisions as bullet points." ) return summary
output
>>> summary = get_meeting_minutes("standup.mp3")
>>> print(summary)
**Action Items:**
- Aaron to review PR #123
- Lisa to update documentation
 
**Key Decisions:**
- Launch date set for Friday

Voice Notes Processing

main.py
from pathlib import Path def process_voice_notes(folder: str) -> list[str]: """Transcribe all voice notes in a folder.""" from connectonion import transcribe results = [] for audio in Path(folder).glob("*.mp3"): text = transcribe(str(audio)) results.append(f"# {audio.stem}\n{text}") return results
output
>>> notes = process_voice_notes("voice_notes/")
>>> print(notes[0])
# idea_2024_01
Remember to add the new transcribe feature to the docs...

Use as Agent Tool

main.py
from connectonion import Agent, transcribe def transcribe_audio(file_path: str) -> str: """Transcribe an audio file to text.""" return transcribe(file_path) agent = Agent("assistant", tools=[transcribe_audio]) result = agent.input("Transcribe the file meeting.mp3 and summarize it")
output
>>> result = agent.input("Transcribe meeting.mp3 and summarize it")
>>> print(result)
I've transcribed the meeting. Here's a summary:
The team discussed the Q4 roadmap and agreed to...

Parameters

ParameterTypeDefaultDescription
audiostrrequiredPath to audio file
promptstrNoneContext hints for accuracy
modelstr"co/gemini-3-flash-preview"Model to use
timestampsboolFalseInclude timestamps in output

Supported Formats

WAVMP3AIFFAACOGGFLACM4AWebM

Token cost: 32 tokens per second of audio (1 minute = 1,920 tokens)

Models

main.py
# OpenOnion managed keys (default - no API key needed) transcribe("audio.mp3", model="co/gemini-3-flash-preview") transcribe("audio.mp3", model="co/gemini-2.5-flash") # Your own Gemini API key (set GEMINI_API_KEY) transcribe("audio.mp3", model="gemini-3-flash-preview") transcribe("audio.mp3", model="gemini-2.5-flash")
output
>>> transcribe("audio.mp3", model="co/gemini-3-flash-preview")
'This is the transcribed text from your audio file...'
 
>>> transcribe("audio.mp3", model="gemini-2.5-flash")
'This is the transcribed text using your own API key...'

What You Get

Simple API - One function for all transcription needs
Context hints - Improve accuracy with domain terms
Multiple formats - WAV, MP3, FLAC, and more
Timestamps - Optional time markers in output
Managed keys - Works out of the box with co/ models

Comparison with Agent

Featuretranscribe()Agent()
PurposeAudio to textMulti-step workflows
InputAudio filesText prompts
OutputPlain textAgent responses
Best forTranscriptionComplex tasks
main.py
# Use transcribe() for audio-to-text text = transcribe("meeting.mp3") # Use Agent for complex workflows with multiple tools agent = Agent("assistant", tools=[search, calculate]) result = agent.input("Research and analyze...")
output
>>> text = transcribe("meeting.mp3")
>>> print(text[:50])
'All right, so here we are in front of the elepha...'
 
>>> result = agent.input("Research and analyze...")
>>> print(result)
I'll help you research and analyze...

Error Handling

main.py
from connectonion import transcribe try: text = transcribe("nonexistent.mp3") except FileNotFoundError: print("Audio file not found") except ValueError as e: print(f"API error: {e}")
output
>>> try:
... text = transcribe("nonexistent.mp3")
... except FileNotFoundError:
... print("Audio file not found")
Audio file not found

Next Steps

Star us on GitHub

If ConnectOnion saves you time, a ⭐ goes a long way — and earns you a coffee chat with our founder.