Getting green screen in ffplay: Streaming desktop (DirectX surface) as H264 video over RTP stream using Live555
It’s harder than it seems.
If you want to use the encoder as you’re doing, by calling IMFTransform interface directly, you have to convert RGB frames to NV12. If you want good performance, you should do it on GPU. Possible to do with pixel shaders, render 2 frames, full size one into DXGI_FORMAT_R8_UNORM render target with brightness, half-size into DXGI_FORMAT_R8G8_UNORM target with color, and write two pixel shaders to produce NV12 values. Both render targets can render into 2 planes of the same NV12 texture, but only since Windows 8.
Other method is use sink writer. It can host multiple MFTs at the same time so you can supply RGB textures in VRAM, the sink writer will first convert them into NV12 with one MFT (that's likely to be proprietary hardware one implemented by GPU driver, just like the encoder), then pass to encoder MFT. It’s relatively easy to encode into an mp4 file, use MFCreateSinkWriterFromURL API to create the writer. It’s much harder to get raw samples out of the sink writer however, you have to implement a custom media sink, custom stream sink for it’s video stream, and call MFCreateSinkWriterFromMediaSink to create the writer.
There’s more.
Regardless on the encoding methods, you can’t reuse frame textures. Each frame you get from DD, you should create a new texture and pass it to MF.
Video encoders expect constant frame rate. DD doesn’t give you that, it gives you a frame every time something changes on the screen. Can be 144 FPS if you have a gaming monitor, can be 2 FPS if the only change is blinking cursor. Ideally, you should submit frames to MF at constant frame rate, specified in your video media type.
If you want to stream to network, more often than not you have to also supply parameter sets. Unless you’re using Intel hardware h265 encoder which is broken with no comments from Intel, MF gives you that data in MF_MT_MPEG_SEQUENCE_HEADER attribute of media type, by calling SetCurrentMediaType on IMFMediaTypeHandler interface. You can implement that interface to get notified. You’ll only get that data after you start encoding. That's if you use a sink writer, for IMFTransform
method it's easier, you should get MF_E_TRANSFORM_STREAM_CHANGE
code from ProcessOutput
method, then call GetOutputAvailableType
to get the updated media type with that magic blob.