DocsTranscribe
Audio Transcription
Convert audio files to text using Gemini's multimodal capabilities. Simple one-function interface for transcription.
Quick Start
main.py
Python REPL
Interactive
That's it! One function for audio-to-text.
With Context Hints
Improve accuracy for domain-specific terms:
main.py
Python REPL
Interactive
With Timestamps
main.py
Python REPL
Interactive
Real Examples
Meeting Minutes
main.py
Python REPL
Interactive
Voice Notes Processing
main.py
Python REPL
Interactive
Use as Agent Tool
main.py
Python REPL
Interactive
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
audio | str | required | Path to audio file |
prompt | str | None | Context hints for accuracy |
model | str | "co/gemini-3-flash-preview" | Model to use |
timestamps | bool | False | Include timestamps in output |
Supported Formats
WAVMP3AIFFAACOGGFLACM4AWebM
Token cost: 32 tokens per second of audio (1 minute = 1,920 tokens)
Models
main.py
Python REPL
Interactive
What You Get
Simple API - One function for all transcription needs
Context hints - Improve accuracy with domain terms
Multiple formats - WAV, MP3, FLAC, and more
Timestamps - Optional time markers in output
Managed keys - Works out of the box with co/ models
Comparison with Agent
| Feature | transcribe() | Agent() |
|---|---|---|
| Purpose | Audio to text | Multi-step workflows |
| Input | Audio files | Text prompts |
| Output | Plain text | Agent responses |
| Best for | Transcription | Complex tasks |
main.py
Python REPL
Interactive
Error Handling
main.py
Python REPL
Interactive
