Alexa? Siri? They Couldn’t Even Intern for Tony Stark
I’ve always been a big fan of Iron Man. Comics, movies, the whole shebang. And like every self-respecting geek with a taste for sarcasm and gadgets, I’ve always wanted my own Jarvis. You know — an assistant that’s more than just a glorified timer who occasionally tells you the weather in a robotic whisper.
But then came Alexa. And Siri. And Google Assistant. And let’s be honest — none of them could even make coffee, let alone help manage a high-tech lab or fire up a repulsor beam. They’re basically the AI equivalent of “that one intern who keeps asking where the stapler is.” Functional, sure — but exciting? Hardly.
Now, I happen to be the lead product designer at a company that works with AI. We’re building tools that rethink what digital agents can actually do — especially in recruiting. I’ve seen firsthand how capable modern AI can be when it isn’t watered down into something that needs to live in a toaster. So one day I thought: why am I waiting for Amazon or Apple to catch up when I can just build my own Jarvis?
The Start: A Glorified Raspberry Pi with a Dream
It started with a Raspberry Pi, a 5-inch portrait touchscreen, and my stubborn obsession with sci-fi realism. I knew I wanted something always-on, something I could talk to naturally. It had to have a face, a voice, and — most importantly — an attitude.
The voice interface came first. I made it so that when I say “Hey Jarvis,” it listens, transcribes my voice, wraps it in pre- and post-prompt modifiers (for personality, of course), sends it off to OpenAI, and reads the answer back out loud. It now supports multiple voice engines — from the default espeak, to espeak-ng, and even Google’s gTTS — so users can choose their preferred sound. Whether you want it snarky, smooth, or sci-fi weird, Jarvis delivers. It listens, transcribes my voice, wraps it in pre- and post-prompt modifiers (for personality, of course), sends it off to OpenAI, and reads the answer back out loud. And no, not in a monotone. In style. With sarcasm. Always addressing me as “MR. Stark.”
Because anything less would be disrespectful.
Building a Face for an Assistant
But Jarvis couldn’t just sound like Jarvis. He needed a face — or at least something to stare at me while thinking about how to answer.
So I built a visual system using state-based video loops: idle, listening, thinking, speaking, waking up, shutting down. Each state randomly selects a video from a library of variants (idle1.mp4, idle2.mp4, etc.) so it feels organic, not robotic.
The effect? A little creepy. A lot awesome. Feels like the assistant is actually alive and reacting — not just waiting for input like a tired cash register.
The quality? Well, far from perfect. It has it’s own mind regarding going fullscreen or not, flickers, closes and opens randomly time to time, but it’s near stable. We are just learning to walk, we’ll run when Ultron is chasing us.
From Personal to Personalised
At first, it was just for me. A little desktop sidekick, custom-built to stroke my Tony Stark delusions.
But the product designer in me wouldn’t shut up. Every time I used it, I found myself wanting to change the voice. Or the prompt tone. Or the assistant’s name. And if I wanted that, others would too. So I built in personality support: folders with their own videos, voice settings, even custom personalities you can upload via a very minimal web based interface. Not pretty, but does the job.
Now Jarvis could be moody. Or charming. Or British. Or call you “sir” or “my liege” — whatever your inner superhero persona desires.
Hardware Deserves Its Suit Too
No self-respecting AI assistant should live naked on a desk. Although I’m not a good 3D modeler, I modeled a custom housing in Tinkercad, 3D printed it, and slotted everything together. The touchscreen, Pi 5, audio module — all housed in a bust of a robot. It’s far from good but looks good from far. I’ll rehome Jarvis from this temporary housing later with the help of some 3d-modeling AI in the near future. Probably give Meshy a go.
And let me just say: I’ve made way “dumber” 3D prints before that people wanted to buy like toilet paper during Covid. So I expect every geek and their nan suddenly wanting a Jarvis of their own.
Control Panel? Yes. But Make It Sexy.
I built a full web-based admin panel that runs on the device. Think “router settings page,” but way cooler.
From this panel, you can:
- Edit prompt instructions
- Change how Jarvis addresses you
- Upload new personalities or logos
- Back up and restore your setup
- Even check for updates
All mobile-optimized. All secure. All very Jarvis.
Because if I ever have to SSH into a Pi again just to change a voice setting, I might throw it into a wormhole.
Maintenance Mode, WiFi Pairing, and QR Magic
Let’s be real: even Tony Stark’s suits had glitches. So I built in a “maintenance mode.”
Say “Jarvis, maintenance mode” and it shows a QR code that links to the admin panel. No need to touch anything — just scan with your phone and tweak what you need.
If you boot Jarvis somewhere new without WiFi? He turns into his own hotspot and serves up a form to add new WiFi credentials. Just scan the QR, enter the info, and boom: he’s back online. At least that’s the idea.
Email, Calendar, and Being Genuinely Useful
Of course, I couldn’t stop at sass and screen candy. Jarvis connects to my Google accounts via Auth0 so I can:
- Send emails by voice
- Ask what meetings I have
- Choose which Google account to send from
Finally — an assistant who actually manages my life instead of just announcing the weather. And thanks to the Vosk engine, it also understands me completely offline. No cloud dependency, no awkward pauses when the internet flakes out — just uninterrupted, Stark-level responsiveness.
Talking to My Smart Home (Without Talking to Alexa)
Jarvis can also pair with IoT devices. Scan a QR, name the device, and now you can say stuff like “Jarvis, turn off the lab lights.”
Take that, Alexa.
Deployable. Sharable. Upgradeable.
I built an auto-installer so you can flash the whole thing to a Pi with one command. I also added backup, restore, and update scripts so it’s not just a hobby project — it’s a real, shippable product platform.
The ultimate goal? Something I can give to others, knowing they’ll have an assistant that’s truly their own — not another half-baked smart speaker with no soul.
The Art Behind the Face
I didn’t just grab any video for Jarvis’ face. I used Freepik’s AI portrait generator to create a surreal, cyborg-like female character. I then animated those states with Hailuo AI, and finally used CapCut to edit and format the videos into smooth portrait loops.
The result? A digital character that breathes, thinks, and stares into your soul while giving you your to-do list.
Taking it to the next level
While I was in the process, I kept thinking, Okay, all sweet and dandy, but how is this not a funkied up Alexa Echo Show? I knew needed to do something about it, so there came an idea: Holograms! Sounds splendid, but how the hell? After some research I have found that I could create a Pepper’s Hologram – even it’s name is fitting too well for the project, right Ms. Potts? So I did a quick prototype with some transparent acrillyc sheets that I had lying around and quickly uploaded the videos to my phone. The theory worked, so I decided to implement holographic mode to make Jarvis really Jarvis. (still flickery, but this is an enemy I’m not ready to fight yet)
Final Thoughts
Jarvis is part sci-fi dream, part product experiment, and part creative therapy. It’s a mix of product design, code, animation, 3D printing, and AI integration — and ChatGPT was there for all of it. From generating shell scripts to debugging Flask to writing UI forms, this assistant helped build my assistant.
Iron Man had Jarvis. Now I do too.
Do you have a feature idea for the next JarvisOS update? Leave it below in the comments and it just might make it into the next sprint.