using vmaf for encoder quality

I have been trying to convert/encode phone videos for archival and reduce filesizes. One of the ways to do is to re-encode with lower bitrate. That reduces size, but reduces quality as well. These days, most of the videos are either encoded in H264/AVC or H265/HEVC.

Reading online, I found that HEVC/H265 is engineered to provide better quality in same bitrate or same-quality in ~50% bitrate of H264/AVC, thus the attraction.

Now, there are many tools to go about encoding/re-encoding:

Handbrake, Starxrip, ffmepg and vlc all build on libavcodec and use same tools underneath. All three except ffmpeg have a ui. Exposing controls and parameters to some extent. I have tried all, and have mostly settled on ffmpeg and staxrip now.

As far as encoding is concerned there are plethora of tools, but validating still requires very specific tools. Again ffmpeg is the best case here.

From validation perspective, there are multiple metrics to validate quality on:

Each of above methods validates a completely different aspect of quality.

PSNR: Peak signal to noise ratio, is a logarithimic representation of mean squared error. For images it typically represents error/deviation/noise as introduced by compression as compared to original image. It is represented in dB and higher number is closer to original image. It is an approximation to human perception of reconstruction quality. This is not the preferred metric.

SSIM: Structual similarity index measure, a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. The resulting value is a decimal value between range 0-1, where 1 is reachable only with original data. The goal is to stay as close to 1 as possible with as lowest bitrate.

VMAF: Video Multimethod Assessment Fusion, is a full video quality metric developed by Netflix. Their blogs have a much detailed information on what it means and how to interpret it. See details in these posts: Towards a practical perceptual video quality metric and vmaf, the journey continues

Interpreting VMAF Scores:
VMAF scores range from 0 to 100, with 0 indicating the lowest quality, and 100 the highest. A good way to think about a VMAF score is to linearly map it to the human opinion scale under which condition the training scores are obtained. As an example, the default model v0.6.1 is trained using scores collected by the Absolute Category Rating (ACR) methodology using a 1080p display with viewing distance of 3H. Viewers voted the video quality on the scale of “bad”, “poor”, “fair”, “good” and “excellent”, and roughly speaking, “bad” is mapped to the VMAF scale 20 and “excellent” to 100. Thus, a VMAF score of 70 can be interpreted as a vote between “good” and “fair” by an average viewer under the 1080p and 3H condition.

Now, back to business for checking quality. Having built a ffmpeg with vmaf support locally on linux/wsl. Things are simpler now. Netflix provides a tool on windows, but it does not seem to work with files compressed with different encoders. With ffmpeg one can invoke it as a filter and let ffmpeg uncompress images for it to compare and evaluate quality metrics.

Netflix went about defining various models on vmaf evaluation. I am using 1080p here. First step is to download the model json file

cd ~\temp
curl -o vmaf_v0.6.1.json 

Then, run following command with ffmpeg, order of inputs is important. First -i (input) is always the the one you want to check with ref, second -i (input).

ffmpeg -i <tocheck.mp4> -i <ref.mp4> -lavfi libvmaf="n_threads=4:model_path=./vmaf_v0.6.1.json:log_fmt=json:ssim=1:log_path=./vmaf.txt:psnr=1:ssim=1" -f null -

libvmaf is quite fussy about where your model file is kept. For simplicity, I copied model file to ~\temp directory and ran ffmpeg from that directory. That way, specifying path was simpler whether running on windows/linux/wsl.

libvmaf takes several parameters:

  • n_threads: how many concurrent threads to use, I noticed it does not benefit really to have more than 4 threads.
  • model_path: specify where the model file is kept. Prefer json file instead of pkl file.
  • log_fmt: specifies in what format log file should be written out. I went with json, but one can specify xml and csv as well. See
  • log_path: specify where to write per frame data. I chose to write in current directory. Again, trying to simplify usage across windows and linux.
  • psnr: set to 1, if you want it to be computed
  • ssim: set to 1, if you want it to be computed

Now sit back and do someting useful, it will take its own sweet time. Here is a dump from my vmaf run (I ran it single threaded, could have sped it up with n_threads=4):

ffmpeg -i /media/sarang/028B-EF23/scratch/test2/NVENC-H264-CQP16-K0-MaxQ-High.x265.crf20.mp4 -i /media/sarang/028B-EF23/scratch/test2/NVENC-H264-CQP16-K0-MaxQ-High.mp4 -lavfi libvmaf="model_path=./vmaf_v0.6.1.json:log_fmt=json:ssim=1:log_path=VMAF.txt" -f null -
ffmpeg version N-101857-g0617e57 Copyright (c) 2000-2021 the FFmpeg developers
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/media/sarang/028B-EF23/scratch/test2/NVENC-H264-CQP16-K0-MaxQ-High.x265.crf20.mp4':
  Duration: 00:02:17.67, start: 0.000000, bitrate: 2772 kb/s
  Stream #0:0(und): Video: hevc (Main) (hev1 / 0x31766568), yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 2508 kb/s, 30 fps, 30 tbr, 15360 tbn, 30 tbc (default)
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from '/media/sarang/028B-EF23/scratch/test2/NVENC-H264-CQP16-K0-MaxQ-High.mp4':
  Duration: 00:02:17.67, start: 0.000000, bitrate: 12556 kb/s
Stream mapping:
  Stream #0:0 (hevc) -> libvmaf:main (graph 0)
  Stream #1:0 (h264) -> libvmaf:reference (graph 0)
  libvmaf (graph 0) -> Stream #0:0 (wrapped_avframe)
  Stream #0:1 -> #0:1 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf58.77.100
  Stream #0:0: Video: wrapped_avframe, yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
      encoder         : Lavc58.135.100 wrapped_avframe
  Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc58.135.100 pcm_s16le
libvmaf INFO `compute_vmaf()` is deprecated and will be removed in a future libvmaf version

I am comparing and evaluating metrics between two differently encoded streams. First is H265 and second is source H264, so ffmpeg made life much simpler. Second, we are outputting to -f null -, ie not saving it anywhere. End result is a print about vmaf and ssim scores.

In this case, vmaf was ~91 and ssim was ~98. So overall this is good quality encoding, which went down to ~2772kbps from ~12550kpbs (roughly 5x reduction in file size with a good quality encoding). Sounds like a win to me :).

I compared file visually later on and both of them looked same on my monitor, I think this is good enough for archival.

A note about model file: - a few tutorials online suggest to get .pkl file, it is deprecated now in favor of .json file - online recommendation is to keep your model files in \usr\local\share\model, which requires sudo access first of all. It is better to keep it else where and specify model_path - I found that specifing relative path to .json file with current directory where I am executing ffmpeg behaved better than specifying absolute path to .json - I downloaded only 1 model file, instead of entire bunch - for some reason psnr could not be specified- but i don’t care about it much. SSIM is much better