DocsTranscribe
Audio Transcription
Convert audio files to text using Gemini's multimodal capabilities. Simple one-function interface for transcription.
Quick Start
main.py
output
That's it! One function for audio-to-text.
With Context Hints
Improve accuracy for domain-specific terms:
main.py
output
With Timestamps
main.py
output
Real Examples
Meeting Minutes
main.py
output
Voice Notes Processing
main.py
output
Use as Agent Tool
main.py
output
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
audio | str | required | Path to audio file |
prompt | str | None | Context hints for accuracy |
model | str | "co/gemini-3-flash-preview" | Model to use |
timestamps | bool | False | Include timestamps in output |
Supported Formats
WAVMP3AIFFAACOGGFLACM4AWebM
Token cost: 32 tokens per second of audio (1 minute = 1,920 tokens)
Models
main.py
output
What You Get
Simple API - One function for all transcription needs
Context hints - Improve accuracy with domain terms
Multiple formats - WAV, MP3, FLAC, and more
Timestamps - Optional time markers in output
Managed keys - Works out of the box with co/ models
Comparison with Agent
| Feature | transcribe() | Agent() |
|---|---|---|
| Purpose | Audio to text | Multi-step workflows |
| Input | Audio files | Text prompts |
| Output | Plain text | Agent responses |
| Best for | Transcription | Complex tasks |
main.py
output
Error Handling
main.py
output
ConnectOnion