I recently got it into my head to compare the various popular video codecs in an effort to better understand how av1 works and looks compared to x264 and x265. I also had ideas of using a intel video card to compress a home video security setup, and what levels of compression I would need to get good results.
The Setup
I used the 4k 6.3gb blender project, tears of steel as a source. I downscaled the video to 1080p using all three codecs, and then attempted to compare the results using various crf levels.
To compare results I used imgsli, FFMetrics, and my own picture viewer to try and see what the differences are.
The Results
crf | av1 KB | x265 KB | x264 KB |
---|---|---|---|
18 | 419,261 | 632,079 | 685,217 – x246 visually lossless |
21 | 352,337 | 390,358 – x265 visually lossless | 411,439 |
24 | 301,517 – av1 VAMF visually lossless | 250,426 | 263,524 – x264 good enough |
27 | 245,685 | 165,079 – x265 good enough | 176,919 |
30 | 205,008 | 110,062 | 122,458 |
33 | 168,192 | 73,528 | 86,899 |
36 | 139,379 – av1 My visually lossless | 48,516 | 63,214 |
39 | 116,096 | 31,670 | 47,161 |
42 | 97,365 – av1 my good enough | 20,636 | 35,801 |
45 | 81,805 | 13,598 | 27,484 |
48 | 69,044 | 9,726 | 20,823 |
51 | 58,316 | 8,586 – worst possible | 16,120 – worst possible |
54 | 48,681 | - | - |
57 | 39,113 | - | - |
60 | 29,062 | - | - |
63 | 16,533 – worst possible | - | - |
I go into more detail with the hows and whys of my choices, in my journal-style blog post, as well as how i came to these conclusions, But in essence, if you want to lose practically no visual information, crf24 through 36 for av1, crf 21 for x265, and crf 18 for x264 will do the job.
If you are low on space, using my ‘good enough’ choices will get you practically the same visual results while using less space, depending on the codec.
Can you explain what you mean by “visually lossless”? Is this a purely subjective classification, or is there a specific definition or benchmark you used?
Visually lossless means I couldn’t tell an image difference even when pixel peeping with imgsli. Good enough means I couldn’t tell a difference in video, but could occasionally see a compression artifact in imgsli.
The VMAF results are purely objective measurements. You can read more about it here; https://en.wikipedia.org/wiki/Video_Multimethod_Assessment_Fusion