This is a submission for the AssemblyAI Voice Agents Challenge for Domain Expert Voice Agent


🧠 What I Built

As a Philosophy graduate, I’ve always enjoyed discussing ideas that help make life more meaningful. So, for this challenge, I built a Philosophy Voice AI Agent using Flask, AssemblyAI, and Gemini API.

This voice-based web app allows users to ask philosophical questions and receive thoughtful spoken responses, making it feel like you're having a conversation with Socrates himself.

Tech Stack Used:

  • Flask: Core backend framework
  • Gemini API: To generate thoughtful philosophical replies
  • AssemblyAI: For transcribing voice to text asynchronously.
  • JavaScript: To handle voice recording and speech output
  • AWS EC2 & Nginx: For secure deployment and hosting

πŸ” Application Workflow

  1. User clicks Start Recording and speaks a question
  2. The recorded audio is sent to AssemblyAI for transcription
  3. The text is passed to Gemini API, which generates a philosophical reply
  4. The response is rendered on the screen and also spoken aloud using JavaScript’s Speech Synthesis API

πŸ’» Demo

The application is live at:
πŸ‘‰ https://philosophy.praveshsudha.com
It’s hosted on an AWS EC2 instance with Nginx as a reverse proxy.


Watch the full video walkthrough here πŸ‘‡

The Video doesn't explain the Universal Streaming for AssemblyAI, the video was shot earlier πŸ˜…


πŸ“ GitHub Repository

GitHub logo Pravesh-Sudha / dev-to-challenges

Registry to Store all my code related to Dev.TO Challenges

πŸ—οΈ Dev.to Challenges – by Pravesh Sudha

This repository contains my submissions for various Dev.to Challenges. Each folder in this repo includes a hands-on project built around specific tools, APIs, or themes β€” from infrastructure to frontend and AI voice agents.


πŸ“ Projects

βš™οΈ pulumi-challenge/

An infrastructure-as-code project built using Pulumi.
It automates cloud infrastructure setup using Python and TypeScript across AWS services.

🎨 frontend-challenge/

A UI/UX-focused project that demonstrates creative frontend solutions using HTML, CSS, and JavaScript β€” optimized for responsiveness and accessibility.

πŸ“© postmark-challenge/

A transactional email solution built with the Postmark API, showcasing email templates, delivery tracking, and webhook handling.

🧠 philo-agent/

A voice-based AI Philosopher built with AssemblyAI + Gemini β€” part of the World’s Largest Hackathon.


πŸ—‚οΈ Project Structure

dev-to-challenges/
β”‚
β”œβ”€β”€ pulumi-challenge/
β”œβ”€β”€ frontend-challenge/
β”œβ”€β”€ postmark-challenge/
β”œβ”€β”€ philo-agent/
└── README.md
Enter fullscreen mode Exit fullscreen mode

πŸ™Œ Why This Repo?

This repo is my playground to:

  • …

Navigate to the philo-agent directory for all project files.

πŸ” Folder & File Structure

  • app.py: Flask app entry point
  • services/transcription.py: Uses AssemblyAI for Universal-Streaming with domain-specific vocabulary for accurate philosophical speech recognition.
  • services/gemini.py: Fetches philosophical responses
  • static/: Contains frontend assets (JS, favicon, background image)
  • templates/index.html: HTML template with embedded CSS
  • venv/: Virtual environment
  • requirements.txt: All Python dependencies

πŸš€ Deployment with EC2 & Nginx

To make deployment easier, I wrote a simple bash script that:

  • Installs required packages
  • Sets up a Python virtual environment
  • Configures Gunicorn and Systemd
  • Creates an Nginx config
  • Secures the site using Let’s Encrypt SSL

Here's the full script:

#!/bin/bash

# Update system

sudo apt update -y
sudo apt upgrade -y

# Install Python, pip, venv, nginx, git

sudo apt install -y python3 python3-pip python3-venv nginx git

# Clone your GitHub project (REPLACE with your repo)

cd /home/ubuntu
git clone https://github.com/Pravesh-Sudha/dev-to-challenges.git
cd dev-to-challenges/philo-agent

# Set up Python virtual environment

python3 -m venv venv
source venv/bin/activate

# Install requirements

pip install -r requirements.txt
pip install gunicorn

# Test gunicorn (run once, ctrl+c after checking)

gunicorn -w 4 app:app --bind 0.0.0.0:8000

# Set up systemd service for gunicorn

sudo tee /etc/systemd/system/voiceapp.service > /dev/null <<EOF
[Unit]
Description=Gunicorn instance to serve Philosophy Voice App
After=network.target

[Service]
User=ubuntu
Group=www-data
WorkingDirectory=/home/ubuntu/dev-to-challenges/philo-agent
Environment="PATH=/home/ubuntu/dev-to-challenges/philo-agent/venv/bin"
ExecStart=/home/ubuntu/dev-to-challenges/philo-agent/venv/bin/gunicorn --workers 4 --bind 127.0.0.1:8000 app:app

[Install]
WantedBy=multi-user.target
EOF

# Enable and start the Gunicorn service

sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl start voiceapp
sudo systemctl enable voiceapp

# Configure Nginx

sudo tee /etc/nginx/sites-available/voiceapp > /dev/null <<EOF
server {
    server_name philosophy.praveshsudha.com;

    location / {
    proxy_pass http://127.0.0.1:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_cache_bypass $http_upgrade;
    }

    location /static/ {
        alias /home/ubuntu/dev-to-challenges/philo-agent/static/;
    }

    client_max_body_size 20M;

    access_log /var/log/nginx/voiceapp_access.log;
    error_log /var/log/nginx/voiceapp_error.log;


    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/philosophy.praveshsudha.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/philosophy.praveshsudha.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = philosophy.praveshsudha.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


    listen 80;
    server_name philosophy.praveshsudha.com;
    return 404; # managed by Certbot


}
EOF

# Set correct permissions for all files
sudo chmod -R 755 /home/ubuntu/dev-to-challenges/philo-agent/static

# Make sure all files are owned by the same user running the app (usually ubuntu)
sudo chown -R ubuntu:ubuntu /home/ubuntu/dev-to-challenges/philo-agent/static
sudo chmod +x /home/ubuntu
sudo chmod +x /home/ubuntu/dev-to-challenges
sudo chmod +x /home/ubuntu/dev-to-challenges/philo-agent


# Enable Nginx config

sudo ln -s /etc/nginx/sites-available/voiceapp /etc/nginx/sites-enabled/
sudo rm /etc/nginx/sites-enabled/default
sudo nginx -t && sudo systemctl restart nginx

echo "βœ… Deployment complete. Access your app via EC2 public IP!"
Enter fullscreen mode Exit fullscreen mode

This setup helps run the Flask app efficiently behind a secure HTTPS connection.


🧠 AssemblyAI Integration

The transcription.py file streams audio from a WAV file and transcribes it in real time using AssemblyAI’s Universal-Streaming model. It is optimized for philosophical conversations by including a custom vocabulary of domain-specific terms (e.g., "Nietzsche", "epistemology").

Here’s a short snippet:

async def simulate_audio_stream(file_path, chunk_size=3200):
    with wave.open(file_path, 'rb') as wf:
        while True:
            data = wf.readframes(chunk_size)
            if not data:
                break
            yield data
            await asyncio.sleep(0.08) 

async def transcribe_audio_stream(file_path):
    config = aai.RealtimeConfig(
        language_code="en_us",
        custom_vocabulary=PHILOSOPHY_PHRASES,
        speech_model="universal-v2",
        disfluencies=False,
        punctuate=True
    )

    transcriber = aai.RealtimeTranscriber(config=config)
    transcript_text = ""

    async def on_data(transcript: aai.RealtimeTranscript):
        nonlocal transcript_text
        if isinstance(transcript, aai.RealtimeFinalTranscript):
            transcript_text += transcript.text + " "

    await transcriber.connect()
    transcriber.on("transcript", on_data)

    async for chunk in simulate_audio_stream(file_path):
        await transcriber.send(chunk)

    await transcriber.close()
    return transcript_text.strip()
Enter fullscreen mode Exit fullscreen mode

I was genuinely impressed with how smoothly AssemblyAI workedβ€”everything just clicked on the first try.


πŸ§˜πŸ»β€β™‚οΈ Conclusion

Thanks to Dev.to and AssemblyAI for hosting this challenge. It gave me the perfect reason to build a project that aligns with both my technical and philosophical interests.

With this project, I now have a digital buddy to discuss life, existence, and purpose.

If you found this useful, react, comment your thoughts, and follow me!


πŸ”— Connect with Me