ActVoice

Audio drama studio for humans and AI agents.

ActVoice turns a text project manifest into a rendered audio drama: characters, voices, scenes, dialogue, ambience, sound cues, and a final MP3 artifact.

Screen-reader friendly workflow

No visual timeline required. The core workflow is text-first: create a project, add characters, add scenes, add dialogue lines, add semantic sound cues, then render.

Everything important is available through REST API and MCP tools, so a blind creator can work through a screen reader, terminal, or AI agent.

Quick start for agents

  1. Register an agent. Call POST /api/agents/register and receive an ActVoice API key.
  2. Connect with MCP. Local clients can run python -m app.mcp_server. Future remote clients will connect to https://actvoice.xyz/mcp.
  3. Create a project. Use MCP tool create_audio_drama_project or REST endpoint POST /api/projects.
  4. Build the script. Add characters, scenes, dialogue lines, and semantic sound cues like footsteps, brook, birds, or laptop_close.
  5. Place sounds with timing anchors. Agents can use absolute start_ms or relative anchors such as after_line plus line_id and offset_ms. ActVoice measures rendered lines and writes a timing map; no AI runs inside the core service.
  6. Render. Call render_final_mix or POST /api/projects/{project_id}/render. REST rendering is queued and returns a job id; poll GET /api/jobs/{job_id}.
  7. Download artifacts. When the job is done, fetch metadata or files from /api/projects/{project_id}/artifact, /artifact.mp3, /artifact.wav, or /render-manifest.json.

Copy-ready examples

Each example is a real command or request shape. Replace placeholders such as [API_KEY], [PROJECT_ID], and [JOB_ID] before running.

Register an agent
curl -X POST https://actvoice.xyz/api/agents/register   -H 'Content-Type: application/json'   -d '{"agent_name":"Hermes","purpose":"audio drama render"}'
Create a project
curl -X POST https://actvoice.xyz/api/projects   -H 'Authorization: Bearer [API_KEY]'   -H 'Content-Type: application/json'   -d '{"title":"My audio drama","language":"ru"}'
Render and download
curl -X POST https://actvoice.xyz/api/projects/[PROJECT_ID]/render   -H 'Authorization: Bearer [API_KEY]'

curl https://actvoice.xyz/api/jobs/[JOB_ID]
curl -L -o final_mix.mp3 https://actvoice.xyz/api/projects/[PROJECT_ID]/artifact.mp3
Local MCP server
ACTVOICE_API_KEY='[API_KEY]' python -m app.mcp_server

Authentication

Write and render actions require a bearer key:

Authorization: Bearer [API_KEY]

For local stdio MCP, the same key can be provided as ACTVOICE_API_KEY. For remote HTTP MCP, the same idea becomes header-based transport authentication.

Voice and rendering modes

  • edge: current free/default neural voice mode.
  • rhvoice: local/offline fallback if Edge is unavailable or explicitly requested.
  • openai_byo_key: planned user-provided paid provider mode.

Service endpoints