Preparing audio data for machine learning, resampling the audio data, filtering the dataset, and converting the audio data into a format suitable for the model’s input.

FLENcentric
2 min readOct 23, 2023

Below, I’ll outline these steps in detail, specifically in the context of 🤗 Datasets and the Whisper ASR (Automatic Speech Recognition) model.

Resampling the Audio Data

Resampling is the process of changing the sampling rate of audio data to match the expected sampling rate of the model. Most pretrained models, like Whisper, are trained on audio data with a specific sampling rate, often 16 kHz. If your dataset’s sampling rate differs, you should resample it.

To resample audio data using 🤗 Datasets, you can use the cast_column method. Here's an example of how to resample audio data to 16 kHz:

from datasets import Audio

minds = minds.cast_column("audio", Audio(sampling_rate=16_000))

Filtering the Dataset

Filtering the dataset allows you to exclude examples that don’t meet certain criteria. For example, you might want to filter out audio samples that are too long or too short. To do this, you can use the filter

--

--

FLENcentric

Crypto buff & trends analyst. Music Producer,& adultADHD neurodivergent Blogging on personal development, Crypto, AI innovations & Tools…Along with other newsr