How to convert sample rate from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16?
I found 2 resample function from FFMPEG. The performance maybe better.
- avresample_convert() http://libav.org/doxygen/master/group__lavr.html
- swr_convert() http://spirton.com/svn/MPlayer-SB/ffmpeg/libswresample/swresample_test.c
Thanks Reuben for a solution to this. I did find that some of the sample values were slightly off when compared with a straight ffmpeg -i file.wav. It seems that in the conversion, they use a round() on the value.
To do the conversion, I did what you did with a bid of modification to work for any amount of channels:
if (audioCodecContext->sample_fmt == AV_SAMPLE_FMT_FLTP)
{
int nb_samples = decoded_frame->nb_samples;
int channels = decoded_frame->channels;
int outputBufferLen = nb_samples & channels * 2;
short* outputBuffer = new short[outputBufferLen/2];
for (int i = 0; i < nb_samples; i++)
{
for (int c = 0; c < channels; c++)
{
float* extended_data = (float*)decoded_frame->extended_data[c];
float sample = extended_data[i];
if (sample < -1.0f) sample = -1.0f;
else if (sample > 1.0f) sample = 1.0f;
outputBuffer[i * channels + c] = (short)round(sample * 32767.0f);
}
}
// Do what you want with the data etc.
}
I went from ffmpeg 0.11.1 -> 1.1.3 and found the change of sample format annoying. I looked at setting the request_sample_fmt to AV_SAMPLE_FMT_S16 but it seems the aac decoder doesn't support anything other than AV_SAMPLE_FMT_FLTP anyway.
EDIT 9th April 2013: Worked out how to use libswresample to do this... much faster!
At some point in the last 2-3 years FFmpeg's AAC decoder's output format changed from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP. This means that each audio channel has it's own buffer, and each sample value is a 32-bit floating point value scaled from -1.0 to +1.0.
Whereas with AV_SAMPLE_FMT_S16 the data is in a single buffer, with the samples interleaved, and each sample is a signed integer from -32767 to +32767.
And if you really need your audio as AV_SAMPLE_FMT_S16, then you have to do the conversion yourself. I figured out two ways to do it:
1. Use libswresample (recommended)
#include "libswresample/swresample.h"
...
SwrContext *swr;
...
// Set up SWR context once you've got codec information
swr = swr_alloc();
av_opt_set_int(swr, "in_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", audioCodec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", audioCodec->sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
swr_init(swr);
...
// In your decoder loop, after decoding an audio frame:
AVFrame *audioFrame = ...;
int16_t* outputBuffer = ...;
swr_convert(&outputBuffer, audioFrame->nb_samples, audioFrame->extended_data, audioFrame->nb_samples);
And that's all you have to do!
2. Do it by hand in C (original answer, not recommended)
So in your decode loop, when you've got an audio packet you decode it like this:
AVCodecContext *audioCodec; // init'd elsewhere
AVFrame *audioFrame; // init'd elsewhere
AVPacket packet; // init'd elsewhere
int16_t* outputBuffer; // init'd elsewhere
int out_size = 0;
...
int len = avcodec_decode_audio4(audioCodec, audioFrame, &out_size, &packet);
And then, if you've got a full frame of audio, you can convert it fairly easily:
// Convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16
int in_samples = audioFrame->nb_samples;
int in_linesize = audioFrame->linesize[0];
int i=0;
float* inputChannel0 = (float*)audioFrame->extended_data[0];
// Mono
if (audioFrame->channels==1) {
for (i=0 ; i<in_samples ; i++) {
float sample = *inputChannel0++;
if (sample<-1.0f) sample=-1.0f; else if (sample>1.0f) sample=1.0f;
outputBuffer[i] = (int16_t) (sample * 32767.0f);
}
}
// Stereo
else {
float* inputChannel1 = (float*)audioFrame->extended_data[1];
for (i=0 ; i<in_samples ; i++) {
outputBuffer[i*2] = (int16_t) ((*inputChannel0++) * 32767.0f);
outputBuffer[i*2+1] = (int16_t) ((*inputChannel1++) * 32767.0f);
}
}
// outputBuffer now contains 16-bit PCM!
I've left a couple of things out for clarity... the clamping in the mono path should ideally be duplicated in the stereo path. And the code can be easily optimized.