Importing sound files into Python as NumPy arrays (alternatives to audiolab)
Audiolab is working for me on Ubuntu 9.04 with Python 2.6.2, so it might be a Windows problem. In your link to the forum, the author also suggests that it is a Windows error.
In the past, this option has worked for me, too:
from scipy.io import wavfile
fs, data = wavfile.read(filename)
Just beware that data
may have int
data type, so it is not scaled within [-1,1). For example, if data
is int16
, you must divide data
by 2**15
to scale within [-1,1).
In case you want to do this for MP3
Here's what I'm using: It uses pydub and scipy.
Full setup (on Mac, may differ on other systems):
import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile
def read_mp3(file_path, as_float = False):
"""
Read an MP3 File into numpy data.
:param file_path: String path to a file
:param as_float: Cast data to float and normalize to [-1, 1]
:return: Tuple(rate, data), where
rate is an integer indicating samples/s
data is an ndarray(n_samples, 2)[int16] if as_float = False
otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
"""
path, ext = os.path.splitext(file_path)
assert ext=='.mp3'
mp3 = pydub.AudioSegment.from_mp3(file_path)
_, path = tempfile.mkstemp()
mp3.export(path, format="wav")
rate, data = scipy.io.wavfile.read(path)
os.remove(path)
if as_float:
data = data/(2**15)
return rate, data
Credit to James Thompson's blog
Sox http://sox.sourceforge.net/ can be your friend for this. It can read many many different formats and output them as raw in whatever datatype you prefer. In fact, I just wrote the code to read a block of data from an audio file into a numpy array.
I decided to go this route for portability (sox is very widely available) and to maximize the flexibility of input audio types I could use. Actually, it seems from initial testing that it isn't noticeably slower for what I'm using it for... which is reading short (a few seconds) of audio from very long (hours) files.
Variables you need:
SOX_EXEC # the sox / sox.exe executable filename
filename # the audio filename of course
num_channels # duh... the number of channels
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8
start_samp # sample number to start reading at
len_samp # number of samples to read
The actual code is really simple. If you want to extract the whole file, you can remove the start_samp, len_samp, and 'trim' stuff.
import subprocess # need the subprocess module
import numpy as NP # I'm lazy and call numpy NP
cmd = [SOX_EXEC,
filename, # input filename
'-t','raw', # output file type raw
'-e','signed-integer', # output encode as signed ints
'-L', # output little endin
'-b',str(out_byps*8), # output bytes per sample
'-', # output to stdout
'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part
data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps))
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels
PS: Here is code to read stuff from audio file headers using sox...
info = subprocess.check_output([SOX_EXEC,'--i',filename])
reading_comments_flag = False
for l in info.splitlines():
if( not l.strip() ):
continue
if( reading_comments_flag and l.strip() ):
if( comments ):
comments += '\n'
comments += l
else:
if( l.startswith('Input File') ):
input_file = l.split(':',1)[1].strip()[1:-1]
elif( l.startswith('Channels') ):
num_channels = int(l.split(':',1)[1].strip())
elif( l.startswith('Sample Rate') ):
sample_rate = int(l.split(':',1)[1].strip())
elif( l.startswith('Precision') ):
bits_per_sample = int(l.split(':',1)[1].strip()[0:-4])
elif( l.startswith('Duration') ):
tmp = l.split(':',1)[1].strip()
tmp = tmp.split('=',1)
duration_time = tmp[0]
duration_samples = int(tmp[1].split(None,1)[0])
elif( l.startswith('Sample Encoding') ):
encoding = l.split(':',1)[1].strip()
elif( l.startswith('Comments') ):
comments = ''
reading_comments_flag = True
else:
if( other ):
other += '\n'+l
else:
other = l
if( output_unhandled ):
print >>sys.stderr, "Unhandled:",l
pass
FFmpeg supports mp3s and works on Windows (http://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/).
Reading an mp3 file:
import subprocess as sp
FFMPEG_BIN = "ffmpeg.exe"
command = [ FFMPEG_BIN,
'-i', 'mySong.mp3',
'-f', 's16le',
'-acodec', 'pcm_s16le',
'-ar', '44100', # ouput will have 44100 Hz
'-ac', '2', # stereo (set to '1' for mono)
'-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
Format data into numpy array:
raw_audio = pipe.proc.stdout.read(88200*4)
import numpy
audio_array = numpy.fromstring(raw_audio, dtype="int16")
audio_array = audio_array.reshape((len(audio_array)/2,2))