TL;DR: I built and open-sourced a production-ready AI platform that combines chat, image analysis, video analysis, and website generation. It uses free models where possible and costs ~$0/month to run. Live demo | GitHub
Why I Built This
Every AI tool I tried was either:
- Too expensive — GPT-4 API bills adding up fast
- Single-purpose — chat OR image analysis, never both
- Closed source — no way to learn from the architecture
I wanted a single platform that handles multiple AI modalities, uses the best free models available, and is fully open-source so other developers can learn from it.
The result is HOCKS AI — a multi-modal AI assistant platform.
🔗 Live: hocks.app
📦 Source: github.com/x-tahosin/hocks-ai
What It Does
| Feature | AI Model | Monthly Cost |
|---|---|---|
| 💬 Streaming Chat | OpenRouter GPT-OSS-120B (free) | $0 |
| 🌐 Website Generator | OpenRouter Nemotron-3 120B (free) | $0 |
| 🖼️ Image Analysis | Google Gemini 2.0 Flash | ~$0.002/call |
| 🎬 Video Analysis | Google Gemini 2.0 Flash | ~$0.003/call |
| 🧠 Memory System | Firebase Firestore | $0 (free tier) |
| 🔐 Auth + Admin | Firebase Auth | $0 |
Total monthly cost: ~$0–5 depending on vision API usage.
The Hybrid Model Strategy
This is the key architectural decision. Instead of paying for one expensive model for everything, I split by capability:
Free Models for Text Tasks
Chat + Code Generation → OpenRouter API
├── openai/gpt-oss-120b:free (120B params, conversational)
└── nvidia/nemotron-3-super-120b-a12b:free (code generation)
These free 120B parameter models are genuinely production-quality for text tasks. GPT-OSS-120B handles conversational AI beautifully — context tracking, nuanced responses, multi-turn dialogue. Nemotron-3 excels at code generation and can build full websites from prompts.
Paid Models for Vision Tasks
Image + Video Analysis → Google Gemini 2.0 Flash
├── analyzeImage (~$0.002/call)
└── analyzeVideo (~$0.003/call)
Free models simply can't match Gemini's multimodal capabilities yet. Image understanding, OCR, visual reasoning — Gemini 2.0 Flash delivers production-quality results at extremely low per-call costs.
Architecture Deep Dive
┌─────────────────────────────────────────────┐
│ Frontend (React 18 + Vite) │
│ Firebase Hosting / hocks.app │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Firebase Cloud Functions (Node 20) │
├─────────────────────────────────────────────┤
│ streamChat ────► OpenRouter (GPT-OSS-120B) │
│ generateCode ──► OpenRouter (Nemotron-3) │
│ analyzeImage ──► Google Gemini 2.0 Flash │
│ analyzeVideo ──► Google Gemini 2.0 Flash │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Firebase Services │
│ • Firestore (users, memories, analytics) │
│ • Authentication (Google + Email/Pass) │
│ • Secret Manager (all API keys) │
│ • Storage (file uploads) │
└─────────────────────────────────────────────┘
Key Design Decisions
1. Zero API Keys in Frontend
Every AI call is proxied through Firebase Cloud Functions. API keys live exclusively in Firebase Secret Manager — not in environment variables, not in .env files, not anywhere in client code.
// Cloud Function reads secret at runtime
const geminiApiKey = defineSecret("GEMINI_API_KEY");
exports.analyzeImage = onCall(
{ secrets: [geminiApiKey] },
async (request) => {
// Key is only available server-side
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
// ...
}
);
2. SSE Streaming for Real-Time Chat
Instead of waiting for the full response, the chat streams tokens in real-time using Server-Sent Events:
// Server: Stream each chunk from OpenRouter
const reader = orResponse.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
res.write(`data: ${JSON.stringify({ text, fullText })}\n\n`);
}
// Client: Render as tokens arrive
eventSource.onmessage = (event) => {
const { text } = JSON.parse(event.data);
updateChatUI(text); // Instant visual feedback
};
3. Per-User Memory System
The AI remembers context across sessions. Users can save memories that persist in Firestore and are injected into every AI conversation:
// Inject memories into system prompt
let systemContent = SYSTEM_PROMPT;
if (memories.length > 0) {
systemContent += "\n\n=== USER'S SAVED MEMORIES ===\n";
memories.forEach((mem, i) => {
systemContent += `${i + 1}. ${mem.content}\n`;
});
}
4. Admin Dashboard with Cost Tracking
Built-in analytics track every API call in real-time:
- Usage counters per feature (chat, image, video, website)
- Daily cost breakdown with budget alerts
- Feature toggles — disable any AI feature instantly
- Audit logging for all admin actions
Security Architecture
| Layer | Implementation |
|---|---|
| API Keys | Firebase Secret Manager (never in code) |
| Data Isolation | Firestore rules enforce per-user access |
| Admin Access | Custom claims + email verification |
| Authentication | Firebase Auth (Google + email/password) |
| Audit Trail | Every admin action logged with timestamp |
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 18, Vite, CSS3 (Glassmorphism dark UI) |
| Backend | Firebase Cloud Functions (Node.js 20) |
| AI Engine | Google Gemini 2.0 Flash + OpenRouter (free models) |
| Database | Cloud Firestore |
| Auth | Firebase Authentication |
| Hosting | Firebase Hosting (custom domain) |
| Secrets | Firebase Secret Manager |
Get Started in 5 Minutes
# Clone
git clone https://github.com/x-tahosin/hocks-ai.git
cd hocks-ai
# Install
cd functions && npm install && cd ..
# Set your API keys securely
firebase functions:secrets:set GEMINI_API_KEY
firebase functions:secrets:set OPENROUTER_API_KEY
# Deploy everything
firebase deploy
You need:
- Node.js 20+
- Firebase CLI (
npm i -g firebase-tools) - A Gemini API key from ai.google.dev (free)
- An OpenRouter API key from openrouter.ai (free models available)
What I Learned
- Free AI models are production-viable — 120B parameter models handle conversational AI surprisingly well
- Hybrid strategies save money — use free for text, paid only for vision
- Firebase Secret Manager > .env files — proper secret management matters in production
- SSE streaming transforms UX — users seeing real-time responses feels dramatically better than waiting
- Cost tracking from day one — know exactly where every dollar goes
Try It
- 🔗 Live demo: hocks.app
- 📦 Source code: github.com/x-tahosin/hocks-ai
- ⭐ Star the repo if you find it useful!
What free AI models are you using in production? I'd love to hear about your hybrid model strategies in the comments.