Real Time Beat Prediction with Aubio

published:

Aubio is a great tool for extracting audio information from music. It is written in C, so it runs really fast, but has also a Python interface, so it's easy to use with Python scripts. We can use it, together with PyAudio, to extract live audio information from a microphone, and use it to predict the moment of the next beat.

The following code is tested on a Raspberry Pi, with a fresh installation of PiOS Bullseye.

First, we need to install Aubio and PyAudio:

sudo apt install python3-pyaudio python3-aubio

Then we can create a python script:

#!/usr/bin/python

import pyaudio
import aubio
import numpy as np
from time import sleep

pa = pyaudio.PyAudio()

audioInputDeviceIndex = 1 # use 'arecord -l' to check available audio devices

audioInputDevice = pa.get_device_info_by_index(audioInputDeviceIndex)
audioInputSampleRate = int(audioInputDevice['defaultSampleRate'])

audioInputChannels = 1

bufferSize = 512 # or 1024

hop_s = bufferSize
win_s = hop_s * 2

tempoDetection = aubio.tempo(method='default', buf_size=win_s, hop_size=hop_s, samplerate=audioInputSampleRate)

def readAudioFrames(in_data, frame_count, time_info, status):

    signal = np.frombuffer(in_data, dtype=np.float32)

    beat = tempoDetection(signal)
    if beat:
        bpm = tempoDetection.get_bpm()
        print("beat! (running with "+str(bpm)+" bpm)")

    return (in_data, pyaudio.paContinue)

print('starting .. (press CTRL+C to stop running)')

inputStream = pa.open(format=pyaudio.paFloat32,
                input=True,
                channels=audioInputChannels,
                input_device_index=audioInputDeviceIndex,
                frames_per_buffer=bufferSize,
                rate=audioInputSampleRate,
                stream_callback=readAudioFrames)

try:
    while True:
        sleep(0.2)
except KeyboardInterrupt:
    pass

inputStream.stop_stream()
inputStream.close()
pa.terminate()

First, we create an inputStream with a stream_callback. This callback function will automatically be called by PyAudio, as soon as it has a bufferSize worth samples.

We then create a tempoDetection object with aubio. Every time, the stream_callback from PyAudio is called, we pass the in_data to the tempoDetection and get back if there is (or should be) a beat in this batch of samples.

If there is a beat, we also check what the current estimated bpm value is. Because of the nature of the used algorithm, the bpm value may be half of the real bpm (the algorithm prefers longer lags, so for example 80 bpm is favored over 160 bpm).

hop_s is the number of samples per iteration that we feed into the beat detection algorithm. win_s is double or quadruple the size of the size of hop_s (or, the more correct way to say it would be hop_s is half or a quarter of the size of win_s). For a hop_s size that is half the size of win_s, this would look like this:

<------ win_s ------>
[......step 0.......]
<- hop_s -><------ win_s ------>
           [......step 1.......]
<- hop_s -><- hop_s -><------ win_s ------>
                      [......step 2.......]

|---------------------|------------------------>
t                  t+win_s               (samples)

This diagram is adapted from a comment by piem on GitHub. Also from this comment is this description:

"the buffer size (also named window size in the code) is the number of audio samples on which each analysis is run. The longer it is, the more precision in the low frequencies (longer wavelength). hop size is the number of audio samples between 2 consecutive windows: the shorter it is, the more temporal resolution increases, as well as computational cost."

A great video about how beat detection algorithms works under the hood is "Tempo and Beat Tracking" by AudioLabsErlangen, and also the PhD dissertation of Paul M. Brossier, Automatic Annotation of Musical Audio for Interactive Applications (which is the basis for Aubio).

----------
Have a comment? Drop me an email!