What is the correct way to fix keyframes in FFmpeg for DASH?
TL;DR
I would recommend the following:
libx264
:-g X -keyint_min X
(and optionally add-force_key_frames "expr:gte(t,n_forced*N)"
)libx265
:-x265-params "keyint=X:min-keyint=X"
libvpx-vp9
:-g X
where X
is the interval in frames and N
is the interval in seconds. For example, for a 2-second interval with a 30fps video, X
= 60 and N
= 2.
A note about different frame types
In order to properly explain this topic, we first have to define the two types of I-frames / keyframes:
- Instantaneous Decoder Refresh (IDR) frames: These allow independent decoding of the following frames, without access to frames previous to the IDR frame.
- Non-IDR-frames: These require a previous IDR frame for the decoding to work. Non-IDR frames can be used for scene cuts in the middle of a GOP (group of pictures).
What is recommended for streaming?
For the streaming case, you want to:
- Ensure that all IDR frames are at regular positions (e.g. at 2, 4, 6, … seconds) so that the video can be split up into segments of equal length.
- Enable scene cut detection, so as to improve coding efficiency / quality. This means allowing I-frames to be placed in between IDR frames. You can still work with scene cut detection disabled (and this is part of many guides, still), but it's not necessary.
What do the parameters do?
In order to configure the encoder, we have to understand what the keyframe parameters do. I did some tests and discovered the following, for the three encoders libx264
, libx265
and libvpx-vp9
in FFmpeg:
libx264
:-g
sets the keyframe interval.-keyint_min
sets the minimum keyframe interval.-x264-params "keyint=x:min-keyint=y"
is the same as-g x -keyint_min y
.Note: When setting both to the same value, the minimum is internally set to half the maximum interval plus one, as seen in the
x264
code:h->param.i_keyint_min = x264_clip3( h->param.i_keyint_min, 1, h->param.i_keyint_max/2+1 );
libx265
:-g
is not implemented.-x265-params "keyint=x:min-keyint=y"
works.
libvpx-vp9
:-g
sets the keyframe interval.-keyint_min
sets the minimum keyframe intervalNote: Due to how FFmpeg works,
-keyint_min
is only forwarded to the encoder when it is the same as-g
. In the code fromlibvpxenc.c
in FFmpeg we can find:if (avctx->keyint_min >= 0 && avctx->keyint_min == avctx->gop_size) enccfg.kf_min_dist = avctx->keyint_min; if (avctx->gop_size >= 0) enccfg.kf_max_dist = avctx->gop_size;
This might be a bug (or lack of feature?), since
libvpx
definitely supports setting a different value forkf_min_dist
.
Should you use -force_key_frames
?
The -force_key_frames
option forcibly inserts keyframes at the given interval (expression). This works for all encoders, but it might mess with the rate control mechanism. Especially for VP9, I've noticed severe quality fluctuations, so I cannot recommend using it in this case.
Here is my fifty cents for the case.
Method 1:
messing with libx264's arguments
-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1
Generate iframes only at the desired intervals.
Example 1:
ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-x264opts "keyint=48:min-keyint=48:no-scenecut" \
-c:a copy \
-y test_keyint_48.mp4
Generate iframes as expected like this:
Iframes Seconds
1 0
49 2
97 4
145 6
193 8
241 10
289 12
337 14
385 16
433 18
481 20
529 22
577 24
625 26
673 28
721 30
769 32
817 34
865 36
913 38
961 40
1009 42
1057 44
1105 46
1153 48
1201 50
1249 52
1297 54
1345 56
1393 58
Method 2 is depreciated. Ommitted.
Method 3:
insert a keyframe every N seconds (MAYBE):
-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)
Example 2
ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-force_key_frames "expr:gte(t,n_forced*2)"
-c:a copy \
-y test_fkf_2.mp4
Generate an iframes in a slightly different way:
Iframes Seconds
1 0
49 2
97 4
145 6
193 8
241 10
289 12
337 14
385 16
433 18
481 20
519 21.58333333
529 22
577 24
625 26
673 28
721 30
769 32
817 34
865 36
913 38
931 38.75
941 39.16666667
961 40
1008 42
1056 44
1104 46
1152 48
1200 50
1248 52
1296 54
1305 54.375
1344 56
1367 56.95833333
1392 58
1430 59.58333333
1440 60
1475 61.45833333
1488 62
1536 64
1544 64.33333333
1584 66
1591 66.29166667
1632 68
1680 70
1728 72
1765 73.54166667
1776 74
1811 75.45833333
1824 75.95833333
1853 77.16666667
1872 77.95833333
1896 78.95833333
1920 79.95833333
1939 80.75
1968 81.95833333
As you can see it places iframes every 2 seconds AND on scenecut (seconds with floating part) which is important for video stream complexity in my opinion.
Genearated file sizes are pretty the same. Very strange that even with more keyframes in Method 3 it generates sometimes less file than standard x264 library algorithm.
For generating multiple bitrate files for HLS stream we choose method three. It perfectly aligned with 2 seconds between chunks, they have iframe at the beginning of every chunk and they have additional iframes on complex scenes which provides better experience for users who has an outdated devices and can not playback x264 high profiles.
Hope it helps someone.
The answer therefore seems to be:
- Method 1 is verified to work, but is
libx264
-specific, and comes at the cost of eliminating the very usefulscenecut
option inlibx264
. - Method 3 works as of the FFMPEG version of April 2015, but you should verify your results with with the script included at the bottom of this post, as the FFMPEG documentation is unclear as to the effect of the option. If it works, it is the superior of the two options.
- DO NOT USE Method 2,
-g
appears to be deprecated. It neither appears to work, nor is it explicitly defined in the documentation, nor is found in the help, nor does it appear to be used in the code. Code inspection shows that the-g
option is likely meant for MPEG-2 streams (there are even code stanzas referring to PAL and NTSC!).
Also:
- Files generated with Method 3 may be slightly larger than Method 1, as interstitial I frames (keyframes) are allowed.
- You should explicitly set the "-r" flag in both cases, even though Method 3 places an I frame at the next frameslot on or after the time specified. Failure to set the "-r" flag places you at the mercy of the source file, possibly with a variable frame rate. Incompatible DASH transitions may result.
- Despite the warnings in the FFMPEG documentation, method 3 is NOT less efficient than others. In fact, tests show that it might be slightly MORE efficient than method 1.
Script for the -force_key_frames
option
Here is a short PERL program I used to verify I-frame cadence based on the output of slhck's ffprobe suggestion. It seems to verify that the -force_key_frames
method will also work, and has the added benefit of allowing for scenecut
frames. I have absolutely no idea how FFMPEG makes this work, or if I just lucked out somehow because my streams happen to be well-conditioned.
In my case, I encoded at 30fps with an expected GOP size of 6 seconds, or 180 frames. I used 180 as the gopsize argument to this program verified an I frame at each multiple of 180, but setting it to 181 (or any other number not a multiple of 180) made it complain.
#!/usr/bin/perl
use strict;
my $gopsize = shift(@ARGV);
my $file = shift(@ARGV);
print "GOPSIZE = $gopsize\n";
my $linenum = 0;
my $expected = 0;
open my $pipe, "ffprobe -i $file -select_streams v -show_frames -of csv -show_entries frame=pict_type |"
or die "Blah";
while (<$pipe>) {
if ($linenum > $expected) {
# Won't catch all the misses. But even one is good enough to fail.
print "Missed IFrame at $expected\n";
$expected = (int($linenum/$gopsize) + 1)*$gopsize;
}
if (m/,I\s*$/) {
if ($linenum < $expected) {
# Don't care term, just an extra I frame. Snore.
#print "Free IFrame at $linenum\n";
} else {
#print "IFrame HIT at $expected\n";
$expected += $gopsize;
}
}
$linenum += 1;
}