Real Time Beat Prediction with Aubio
published:Aubio is a great tool for extracting audio information from music. It is written in C, so it runs really fast, but has also a Python interface, so it's easy to use with Python scripts. We can use it, together with PyAudio, to extract live audio information from a microphone, and use it to predict the moment of the next beat.
The following code is tested on a Raspberry Pi, with a fresh installation of PiOS Bullseye.
First, we need to install Aubio and PyAudio:
sudo apt install python3-pyaudio python3-aubio
Then we can create a python script:
#!/usr/bin/python
import pyaudio
import aubio
import numpy as np
from time import sleep
pa = pyaudio.PyAudio()
audioInputDeviceIndex = 1 # use 'arecord -l' to check available audio devices
audioInputDevice = pa.get_device_info_by_index(audioInputDeviceIndex)
audioInputSampleRate = int(audioInputDevice['defaultSampleRate'])
audioInputChannels = 1
bufferSize = 512 # or 1024
hop_s = bufferSize
win_s = hop_s * 2
tempoDetection = aubio.tempo(method='default', buf_size=win_s, hop_size=hop_s, samplerate=audioInputSampleRate)
def readAudioFrames(in_data, frame_count, time_info, status):
signal = np.frombuffer(in_data, dtype=np.float32)
beat = tempoDetection(signal)
if beat:
bpm = tempoDetection.get_bpm()
print("beat! (running with "+str(bpm)+" bpm)")
return (in_data, pyaudio.paContinue)
print('starting .. (press CTRL+C to stop running)')
inputStream = pa.open(format=pyaudio.paFloat32,
input=True,
channels=audioInputChannels,
input_device_index=audioInputDeviceIndex,
frames_per_buffer=bufferSize,
rate=audioInputSampleRate,
stream_callback=readAudioFrames)
try:
while True:
sleep(0.2)
except KeyboardInterrupt:
pass
inputStream.stop_stream()
inputStream.close()
pa.terminate()
First, we create an inputStream
with a stream_callback
. This callback function will automatically be called by PyAudio, as soon as it has a bufferSize
worth samples.
We then create a tempoDetection
object with aubio. Every time, the stream_callback
from PyAudio is called, we pass the in_data
to the tempoDetection
and get back if there is (or should be) a beat
in this batch of samples.
If there is a beat
, we also check what the current estimated bpm
value is. Because of the nature of the used algorithm, the bpm
value may be half of the real bpm (the algorithm prefers longer lags, so for example 80 bpm is favored over 160 bpm).
hop_s
is the number of samples per iteration that we feed into the beat detection algorithm. win_s
is double or quadruple the size of the size of hop_s
(or, the more correct way to say it would be hop_s
is half or a quarter of the size of win_s
). For a hop_s
size that is half the size of win_s
, this would look like this:
<------ win_s ------>
[......step 0.......]
<- hop_s -><------ win_s ------>
[......step 1.......]
<- hop_s -><- hop_s -><------ win_s ------>
[......step 2.......]
|---------------------|------------------------>
t t+win_s (samples)
This diagram is adapted from a comment by piem on GitHub. Also from this comment is this description:
"the buffer size (also named window size in the code) is the number of audio samples on which each analysis is run. The longer it is, the more precision in the low frequencies (longer wavelength). hop size is the number of audio samples between 2 consecutive windows: the shorter it is, the more temporal resolution increases, as well as computational cost."
A great video about how beat detection algorithms works under the hood is "Tempo and Beat Tracking" by AudioLabsErlangen, and also the PhD dissertation of Paul M. Brossier, Automatic Annotation of Musical Audio for Interactive Applications (which is the basis for Aubio).
----------
Have a comment? Drop me an email!