Real Time Beat Prediction with Aubio

published: / updated:

Aubio is a great tool for extracting audio information from music in real time. It is written in C, so it runs really fast, but has also a Python interface that is easy to use with Python scripts. We can use it, together with PyAudio, to extract live audio information from a microphone, and then predict the moment of the next beat.

The following code is tested on a Raspberry Pi, with a fresh installation of PiOS Bullseye.

First, we need to install Aubio and PyAudio:

sudo apt install python3-pyaudio python3-aubio

Then we can create a Python script:

#!/usr/bin/python
 
 
import pyaudio
import aubio
import numpy as np
from time import sleep
 
 
seconds = 10 # how long this script should run
 
bufferSize = 512
windowSizeMultiple = 2 # or 4 for higher accuracy, but more computational cost
 
audioInputDeviceIndex = 1 # use 'arecord -l' to check available audio devices
audioInputChannels = 1
 
 
# create the aubio tempo detection:
hopSize = bufferSize
winSize = hopSize * windowSizeMultiple
tempoDetection = aubio.tempo(method='default', buf_size=winSize, hop_size=hopSize, samplerate=audioInputSampleRate)
 
 
# this function gets called by the input stream, as soon as enough samples are collected from the audio input:
def readAudioFrames(in_data, frame_count, time_info, status):
 
    signal = np.frombuffer(in_data, dtype=np.float32)
 
    beat = tempoDetection(signal)
    if beat:
        bpm = tempoDetection.get_bpm()
        print("beat! (running with "+str(bpm)+" bpm)")
 
    return (in_data, pyaudio.paContinue)
 
 
# create and start the input stream
pa = pyaudio.PyAudio()
audioInputDevice = pa.get_device_info_by_index(audioInputDeviceIndex)
audioInputSampleRate = int(audioInputDevice['defaultSampleRate'])
inputStream = pa.open(format=pyaudio.paFloat32,
                input=True,
                channels=audioInputChannels,
                input_device_index=audioInputDeviceIndex,
                frames_per_buffer=bufferSize,
                rate=audioInputSampleRate,
                stream_callback=readAudioFrames)
 
 
# because the input stream runs asynchronously, we just wait for a few seconds here before stopping the script:
sleep(seconds)
 
inputStream.stop_stream()
inputStream.close()
pa.terminate()

First, we create an inputStream with a stream_callback. This callback function will automatically be called by PyAudio, as soon as it has a bufferSize worth samples.

We then create a tempoDetection object with aubio. Every time, the stream_callback from PyAudio is called, we pass the in_data to the tempoDetection and get back if there is (or should be) a beat in this batch of samples.

If there is a beat, we also check what the current estimated bpm value is. Because of the nature of the used algorithm, the bpm value may be half or double of the real bpm (the algorithm prefers measurements around 107bpm, with a minimum of 40 and a maximum of 250 bpm, so for example 80 bpm is favored over 160 bpm).

hopSize is the number of samples per iteration that we feed into the beat detection algorithm. winSize is double or quadruple the size of the size of hopSize. With a source of 44100 Hz, a hopSize of 512 audio samples works great.

A great video about how beat detection algorithms works under the hood is "Tempo and Beat Tracking" by AudioLabsErlangen, and also the PhD dissertation of Paul M. Brossier, Automatic Annotation of Musical Audio for Interactive Applications (which is the basis for Aubio).

We can also expand that script, so that we can run at a fixed framerate:

#!/usr/bin/python
 
 
import pyaudio
import aubio
import numpy as np
import time
from queue import Queue
 
 
frameRate = 30
 
bufferSize = 512
windowSizeMultiple = 2 # or 4 for higher accuracy, but more computational cost
 
audioInputDeviceIndex = 1 # use 'arecord -l' to check available audio devices
audioInputChannels = 1
 
 
# create the audio object and get some device information
pa = pyaudio.PyAudio()
audioInputDevice = pa.get_device_info_by_index(audioInputDeviceIndex)
audioInputSampleRate = int(audioInputDevice['defaultSampleRate'])
 
# create the aubio tempo detection:
hopSize = bufferSize
winSize = hopSize * windowSizeMultiple
tempoDetection = aubio.tempo(method='default', buf_size=winSize, hop_size=hopSize, samplerate=audioInputSampleRate)
 
# create a Queue, to handle multithreaded message exchange:
beatQueue = Queue()
 
 
# this function gets called by the input stream, as soon as enough samples are collected from the audio input:
def readAudioFrames(in_data, frame_count, time_info, status):
 
    signal = np.frombuffer(in_data, dtype=np.float32)
 
    beat = tempoDetection(signal)
    if beat:
        bpm = tempoDetection.get_bpm()
        beatQueue.put(bpm)
 
    return (in_data, pyaudio.paContinue)
 
 
# start the input stream
inputStream = pa.open(format=pyaudio.paFloat32,
                input=True,
                channels=audioInputChannels,
                input_device_index=audioInputDeviceIndex,
                frames_per_buffer=bufferSize,
                rate=audioInputSampleRate,
                stream_callback=readAudioFrames)
 
 
targetTime = 1./frameRate
running = True
bpm = -1
while running:
    try:
        
        startTime = time.time()
 
        beat = False
        if beatQueue.qsize() > 0:
            bpm = beatQueue.get()
            beat = True
 
        # do something with the beat or bpm here ..
        if beat:
            print('beat detected; running with '+str(bpm)+' bpm')
 
        usedTime = time.time() - startTime
        sleepTime = targetTime-usedTime
        if sleepTime < 1:
            # print('fps not reached')
            sleepTime = 1
        time.sleep(sleepTime)
 
    except KeyboardInterrupt:
        running = False
 
 
inputStream.stop_stream()
inputStream.close()
pa.terminate()

This lets us run the beat detection in a seperate thread in the background, and use a Queue to store detected beats (and the estimated bpm at that time). The main thread (inside the while loop) can then run with a fixed framerate, and retreive the detected beats (and bpm values) as close to a frame as possible. With a hopSize of 512 and a windowSize of 1024 (double the hopSize), the audio thread should run with about 178 fps in the background.


Have a comment? Drop me an email!
This helped you? Consider buying me a ♥ coffee ♥

Latest Notes

  1. Shadowdark - Player-Driven Death Mechanics
  2. Using Syncthing to back up photos from an Android phone
  3. Performant Images on the Web
  4. mdadm with bcache and btrfs
  5. Automatically backup your complete Linux system when connecting to a specific wifi network