Creating a DIY Voice Assistant: First Steps

I’ve been wanting to replace my Google and Amazon devices with a DIY voice assistant for a while. Now that I’ve got something working, I’d like to share how I did it.

First, some background

I don’t actually make very heavy use of my Google Nest Mini, but it’s extremely helpful when my hands are busy or I don’t have enough time to grab a pen without distracting myself from what I’m already doing.

In practice, I use my Google Nest Mini to:

Set kitchen timers while I’m cooking
Add things to my shopping list, again generally while I’m cooking
Quickly add tasks to my to-do list
Quickly add reminders to my to-do list
Control my Sonos speaker

I would like my DIY voice assistant to be able to do all of these things. I’ll define tasks as to-do list items that have no associated due dates or notifications. Reminders on the other hand will be to-do list items that do have due dates and generate reminder notifications. Everything else is hopefully self-explanatory.

I use Todoist as my to-do list application. Sadly, integration with Google and Amazon products ended a long time ago. All the more reason to come up with my own solution!

In addition to integrating with Todoist, I wanted something that I could run locally and that would respect my privacy. I also wanted to be able to change things I didn’t like and add more features as I needed them. Lastly, I wanted a fun project to work on.

Home Assistant

Home Assistant (HA) is a popular open-source home automation platform that lets you control and automate smart devices from a single place. It runs locally (on your own hardware), giving you full privacy, flexibility, and control without relying on cloud services.

And it supports voice control.

As I will be integrating with Todoist, I will in fact be sharing some data with one specific cloud service, but I will be doing this intentionally and by choice.

You can install and configure Home Assistant to control your smart devices and run various tasks in response to various cues. Then, you can set up a voice assistant to listen to and process your cues.

Running Home Assistant

There are effectively three recommended ways to run Home Assistant:

Install Home Assistant OS (HAOS) on a spare computer or a Raspberry Pi
Install Home Assistant OS on a virtual machine using something like VirtualBox
Run Home Assistant (not the full OS!) in a container using Docker or Docker Compose

If you have a spare computer around, the simplest way to get a voice assistant pipeline going is probably to:

Install HAOS on to your hardware
Install the various components of the voice assist pipeline as HAOS add-ons

Using the above methods everything should be simple and you can probably skim the rest of this post.

The slightly more complicated, but more flexible approach

Caveat: I am running the following on Linux and using Docker Engine. Home Assistant might not function properly running on Docker Desktop. I don’t know.

I don’t have any spare hardware available at the moment and virtual machines run too slow for my taste. I’ll be going with the third option and running HA in a container. If you too will be running HA in a container, you won’t have the ability to install HA add-ons and you’ll need to get a little bit more hands-on.

I’ll show you how.

Official installation instructions are here, but here’s a quick summary:

Create a new folder where Home Assistant will live
Create the file compose.yaml in the above folder
Edit compose.yaml to contain the following:

services:
  homeassistant:
    container_name: homeassistant
    image: "ghcr.io/home-assistant/home-assistant:stable"
    volumes:
      - ./config:/config
      - /etc/localtime:/etc/localtime:ro
      - /run/dbus:/run/dbus:ro
    restart: unless-stopped
    network_mode: host

In the directory containing compose.yaml, run the command docker compose up

No matter what you might hear, you should not need to run the container in privledged mode. And frankly, you probably shouldn’t be giving anything elevated privledges that aren’t required.

Open http://localhost:8123 and you should be all set.

If you’re not sure where to start, take a look at the official onboarding documentation.

Then, some more background

As I’ve said I won’t be able to use add-ons so I’ll have to blaze my own path. It’s not that bad.

A smart voice assistant consists of multiple distinct services. You’ll need:

A “wake word” service that can listen to an audio stream and determine whether what it is hearing is (A) the wake word or (B) not the wake word. Yes, this is basically “Hotdog or Not Hotdog”.
A speech to text service that can understand what you are saying and transcribe it to text.
Some “brains” to understand what your cue actually means and what should be done in response
A text to speech service that can speak the brain’s text response back to you

I’m counting three AI models so get your RAM out. I’m running things quite well with about 3 GB available memory, an 8 year old Intel i7 and without a dedicated GPU.

You’ll also need a mic and a speaker connected to a system running a service that allows for audio streaming. Home assistant calls this piece of hardware a satellite and so will I.

The good news is that, as well as I understand, all of the voice assistant related Home Assistant add-ons are actually adaptations of various services from the also-well-worth-looking-into Rhasspy project. You can download the services you need and just run them right on your workstation. If you download the Wyoming protocol compatible versions (and you should) Home Assistant will automatically detect all of the running services and from there it’s just a few clicks.

Some of these services require some half-decent hardware to run efficiently. Being able to run them anywhere you want (not just on the same hardware as Home Assistant) is a pretty good deal.

Actually doing it

To review, our end goal is a computer equiped with a microphone and speakers that has the following running:

A “satellite” service (wyoming-satellite)
A wakeword detector (OpenWakeWord)
A Speech-to-Text service (Whisper)
A “brain” (Home Assistant)
A Text-to-Speech service (Piper)

Since your computer has a microphone and speakers, it will function not just as the voice assistant service, but also as it’s own satellite.

You already have Home Assistant running in a container if you followed the above steps. Let’s take care of the rest.

Here are the GitHub repositories:

I recommend creating a new assist-pipeline directory and cloning the above four repositories into it. If you follow my commands, some of the above four services will also be writing data into the assist-pipeline directory, so don’t be too surprised when you see some new directories show up.

Running the services

The details are in the respective READMEs of the above repositories, but here’s a quick start:

Piper

Install / Setup

# From directory `assist-pipeline`
git clone https://github.com/rhasspy/wyoming-piper.git

cd wyoming-piper
script/setup

curl -L -s "https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz" | tar -zxvf - -C ..

Run

script/run --piper '../piper/piper' --voice en_US-lessac-medium --uri 'tcp://0.0.0.0:10200' --data-dir ../piper-data --download-dir ../piper-data --noise-scale 0.667 --length-scale 1.0 --noise-w 0.333 --update-voices --streaming

Whisper

Install / Setup

# From directory `assist-pipeline`
git clone https://github.com/rhasspy/wyoming-faster-whisper.git
cd wyoming-faster-whisper
script/setup

Run

script/run --model small --language en --uri 'tcp://0.0.0.0:10300' --data-dir ../wyoming-faster-whisper-data --download-dir ../wyoming-faster-whisper-data

Note: Change the model to a larger model for better accuracy or a smaller model for better speed.

OpenWakeWord

Install / Setup

# From directory `assist-pipeline`
git clone https://github.com/rhasspy/wyoming-openwakeword.git
cd wyoming-openwakeword
script/setup

Run

script/run   --uri 'tcp://0.0.0.0:10400'   --preload-model 'hey_rhasspy'

Note: If script/setup fails, try using Python 3.9.19

Satellite

Install / Setup

# From directory `assist-pipeline`
git clone https://github.com/rhasspy/wyoming-satellite.git
cd wyoming-satellite
script/setup
.venv/bin/pip3 install 'webrtc-noise-gain==1.2.3'

Run

script/run   --name 'my satellite'   --uri 'tcp://0.0.0.0:10700'   --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw'   --snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw'   --wake-uri 'tcp://127.0.0.1:10400'   --wake-word-name 'hey_rhasspy' --done-wav sounds/done.wav --timer-finished-wav sounds/timer_finished.wav --timer-finished-wav-repeat 8 0 --mic-auto-gain 5 --mic-noise-suppression 2 --mic-no-mute-during-awake-wav

Note: If you’re getting errors see the official README. You may need to install a dependency such as package alsa-utils.

Putting it all together

Restart HA (stop and start the container)
Open HA in your browser
Click Settings on the left
Click Devices & Services
For each of the four services, click Add Integration in the bottom right
Add the service by host and port number
Return to Settings
Click Voice Assistants
Under Assist, click Home Assistant to configure settings
Set Speech-to-text to faster-whisper
Set Text-to-speech to piper
Click Update in the bottom right

In regards to step 6, the host should be 127.0.0.1 and if you used my commands the respective ports are 10200, 10300, 10400 and 10700.

Trying it out

You should now be ready to go! Try this:

Say “Hey Rhasspy”
Wait about a half second
Say “What time is it?”

If you hear a chime and your assistant responds with the time, it’s working!

Next steps

Take a closer look at the official README files for all of the Rhasspy related repositories for customization options.
Likewise, take a closer look at the Voice Assistants configuration dialog and find a voice you like.

Up next

In a future post, I’ll be explaining how to integrate HA with Todoist.

Return to Home