jreeves - Video Conversion

Recently I was able to upgrade the size of my TV, and after watching a bit of content, I realized that much of my movie library isn't actually as "HD" as I originally thought. This led me on a journey that ultimately is costing a dozen dollars a month in electricity usage: the quest for high quality, low bitrate movies (yes, the idea is a bit oxymoronic).

The Problem

Before, my movie library came from several different sources. These varied quite a bit depending on where it came from, what codec was used, original source material, etc. There was even a good amount that was still ripped from DVDs, and if you can believe it, VHS! I've always known that these weren't the best copies of the films out there, but to an extent I just didn't care because I didn't have a device that I could tell a difference on.

Long story short, as I tested with videos of higher bitrate, there still was quite a variety of quality. As I dove more into what makes a good video encode, I realized that the videos I had all had wildly different encoding settings. I imagine that some were even encoded from some other crappy encode. Rather than trying to deal with all the inconsistencies, I decided to go with the most straightforward approach - getting the highest quality video source file I could (and one that hadn't been re-encoded several times). This would be Blu-Ray discs, or "remuxes" (unaltered Blu-Ray material in an mkv container).

Here is the catch: even with a sizeable pool of hard drives, I'm still constrained by my server's limited capacity. I'd love to store remuxes only, but it adds up quickly - let's say the average remux is 25-26 GiB (this is a guess, but it's probably not far off). At a collection of 2,000 movies, that's ~50 TiB. I wouldn't be able to dedicate that kind of storage solely to movies, so I decided to go with the next best thing: encoding videos myself from Blu-Ray remuxes. This way, I could micromanage what I wanted regarding bitrates, encoding settings, etc., to give the "perfect" mix of quality to size ratio.

Deciding on encoding settings

This was the most difficult part of the project, because while there are some absolutes in video encoding, settings that are very similar can look different to each set of eyes - so there's a variety of opinions when it comes to what the "best" settings are. That being said, the settings I use are intended to get an overall good result when batch encoding many movies, and for each individual movie, there are probably settings that could be tweaked to get a better individual result. I don't have time to figure that out for each movie, so getting a good batch encode is going to be my next best bet.

One of the first big choices was which codec to use. I went with H.265, mostly by process of elimination. H.264 I ruled out because the bitrate required for a great encode was too much to swallow. AV1 I ruled out because while it is an awesome codec, at the time of writing (2022) hardware support is far from the norm. It also takes an incredible amount of time to encode. H.266 is so far out that it's not worth considering (plus, FFmpeg won't have a library for it for a while). H.265 encoding is slow, especially for the settings I use, but it's tolerable enough. It also has pretty good hardware decoding support across most platforms.

Diving further into the encoding settings themselves, the lines start to become blurred between fact and opinion. I'll probably state some things as facts that are far from it - just be aware that my encoding settings work pretty well for my use case, and not all principles can be applied to a project with different variables. Anyway, the two big settings I went with right off the bat were 10 bit HEVC, and using a CRF value instead of a target bitrate with either 1 or 2 passes. This is generally considered best practice, as target bitrate introduces problems of its own depending on the content.

With other settings beyond a CRF value, I tried to have as few as possible because [in my opinion] the HEVC developers did a pretty good job making the vanilla encoding settings work well on most videos. The only one that I felt strongly enough about its impact to include was disabling SAO (a tool used to smooth objects out) on animated content. Beyond that, since I'm strictly doing batch encoding of remuxes, it seemed having good general settings would be better than diving in to which option gives the best result on a specific movie.

That's enough info (and disclaimers) to get to the really cool part, the automation! Granted, it's just a simple Python script, but it does the job well. This script was written on Windows, it could be easily modified to work on Mac/Linux.


import os, subprocess, re
import tkinter as tk
from tkinter import filedialog
from collections import defaultdict

root = tk.Tk()                                                              # This is just to suppress the Tkinter window from displaying
root.withdraw()

root_path = filedialog.askdirectory(title="Select Source Directory")        # get source and destination paths
dest_path = filedialog.askdirectory(title="Select Destination Directory")

crf = input('CRF rate eg. 18: ')                                            # input variables (note that I do no error checking, since I'm the only one running this script)
conv_audio = input('Convert Audio?: ')
is_animation = input('Animated?: ')
crop = input('Crop Videos (if possible)?: ')
if crop == 'y':                                                             # bad idea to enable cropping with IMAX movies, since the parsed frames may not be in an IMAX shot
    offset = input('Seconds to skip before crop detecting (e.g. 900): ')
    if offset == '':                                                        # default to 900 seconds
        offset = '900'
    frames = input('Frames to parse (e.g. 14400): ')
    if frames == '':
        frames = '14400'                                                    # default to 14400 frames to parse (yes, it's overkill, but I wanted to be safe since I won't be checking every video output)
        
for root, directory, files in os.walk(root_path):                           # recursively find all video files
    for f in files:
        if f[-4:] in ['.mp4', '.mkv', '.mov', '.MKV', '.avi']:
            path = os.path.join(root,f)
            new_path = os.path.join(dest_path,f)
            if not os.path.exists(new_path):                                # for starting in the middle of a batch - don't process completed videos
                result = subprocess.getoutput(f'ffprobe "{path}"')
                crop_data = ''
                if crop == 'y':                                             # calculate crop data by parsing X frames if desired
                    print('Processing cropping data... for ' + path)
                    crop_output = re.findall(r'crop=(.*?)\n', subprocess.getoutput(f'ffmpeg -ss {offset} -i "{path}" -vframes {frames} -vf cropdetect -f null -'))
                    crop_list = defaultdict(int)
                    for x in crop_output:
                        crop_list[x] += 1                                   # This is one of the simplest ways to go about it, just throw all the possible crops into a dictionary
                    frames = int(frames)
                    for k,v in crop_list.items():
                        if v > (frames * .95):                              # as long as more than 95% of frames have the same crop data, we're in business. If they don't, the video will maintain the same aspect ratio.
                            crop_data = '-vf crop='+k
    
                subs = ''
                animated = ''
                sao = ''
                audio = '-c:a copy'                                         # default is copy audio
                if is_animation == 'y':
                    animated = '-tune animation'                            # I have noticed a very slight advantage in sharpness when removing sao for animation. Debatable though
                    sao = ':no-sao=1'
                if 'Subtitle: ' in result:                                  # ffmpeg will get angry if you tell it to copy subs when there aren't any
                    subs = '-c:s copy'
                if conv_audio == 'y':
                    audio = '-c:a ac3 -b:a 640k'                            # default for audio conversion is ac3 640Kbps (this is not optimal, and isn't used frequently. Most movies have a DTS-HD track)
                    if 'Audio: dts' in result:
                        audio = '-bsf:a dca_core -c:a copy'                 # extract DTS core if available. This is better than actually converting audio, because ffmpeg doesn't have a great audio encoder
                        
                # this is the bread and butter of the entire process. all the settings get thrown into this one ffmpeg call. It may not be the most optimized, but it works.
                os.system(f'ffmpeg -hide_banner -i "{path}" {crop_data} -pix_fmt yuv420p10le -preset slow {animated} -map 0:a? -map 0:s? {audio} {subs} -map 0:v -c:v libx265 -x265-params profile=main10:crf={crf}{sao} -y "{new_path}"')

Figuring out CRF and encoding speed

The FFmpeg HEVC documentation states "Use the slowest preset you have patience for.". Based off of a few opinions on message boards, I decided on the 'slow' preset. It gives enough advantages over 'medium' and 'fast', while not having an unnecessarily long encode time.

The last factor is the CRF value to use for HEVC. This has a variety of opinions. For example, these are some of the things I've heard (some may be more factual than others):

Some of those statements seem to contradict each other, so I set out to do my own research on the matter. Testing out on a subset of 10 movies, I honestly couldn't tell a visual difference between anything of CRF < 20-21. I could just set everything at CRF 19 and forget about it, but I wanted to give a little bit of wiggle room for those whose eyes are better than mine. So I decided to look at it from a storage perspective - because after all, my space is limited. At CRF 19, the average video bitrate within my [small] subset was ~8Mbps. After a few calculations based on the storage space I have and how large of a movie collection I'd like to be able to handle, I came up with a fairly reasonable number of allocating 10GiB per movie. Let's do the math here: Overestimating a little bit, the average movie is 2 hours. 10GiB/2 hours equals an overall bitrate of 11,651 Kbps. Subtracting out 1509 kbps for a DTS track and a hundred or so kbps for subtitles/overhead, we are left with ~10Mbps for the video stream.

Execution: putting the theory into practice

So 10Mbps is the target bitrate for my movies. Obviously not every movie is going to come out to exactly that bitrate with CRF encoding, so I went with a range of acceptable bitrates. The workflow currently looks like this:

There are some that still fall out of the 7-13 Mbps range even when the CRF is tweaked. For movies that are too small at CRF 14, I leave them be. No reason to complain about that! For movies that are still > 13Mbps at CRF 22, I still don't have a great solution. Out of about 250 movies that I've encoded using this method, maybe 10 or so fall into that category. A great example of this is the movie 12 Angry Men (1957). Even at CRF 24 (over what I'd like the CRF to be), the video bitrate sits at 17Mbps. I may try to figure out some reasonable H.264 encoding settings for edge cases like this at some future point.

Encoding time: The Achilles Heel?

Given that I want the best possible encode of these remuxes, I'm forced to use pure CPU encoding, with no hardware acceleration enabled on either the CPU or GPU. This presents a possible problem: is encoding >1000 movies to HEVC on the 'slow' preset actually feasible? I just happened to have a Ryzen 9 3950X laying around, which has become the main CPU used on this project. With 16 cores, it takes 2-3 simultaneous encodes to keep it at 100% continuous usage. The instantaneous encoding speed at any given time can range from an aggregate of 0.5 speed to 1.2 speed (this means that if I have 3 encodes all running at 0.3 speed, the aggregate encode speed is 0.9).

In the last batch of 100 movies I encoded, it took almost exactly 11.5 days to finish encoding. This means that my average encoding time per movie hovers at about 2.76 hours. The thorn in my side in all of this is that the assumption of CRF 16 and 18 only winds up being within the target bitrate range ~45% of the time (this is the best result I could get. Original guesses at other CRF values were only in the target bitrate range 30% of the time). 55% of all movies require a second encode at a modified CRF. So, unfortunately, that 11.5 days for 100 movies suddenly becomes 17.8 days to get encodes that are all within the bitrate constraints. That makes for a calculated encoding time of 4.28 hours per movie.

My movie library currently sits at ~1000 movies. Doing the math at 4.28 hours per movie, the total encoding time on my Ryzen is 178 days, excluding overhead. That's about 6 months. Is it way longer than I would prefer? Yes. Is it attainable? Absolutely. As of the time of writing this, I'm about 1.5 months into the project, with no regrets (so far).

Conclusion

I think that my solution for encoding movies is a pretty good compromise between storage space and video quality. There are still a LOT of movies left to encode before I can wrap this project up, but after the initial setup and research, it's fairly automated and I don't need to focus so much attention on it. Now if only I could acquire a lot more CPUs to do encoding with...

Other fun automation stuff for this project

These are some other random scripts that help reduce the manual aspect of doing this project. They're all quite simple, but worth sharing since not everybody is familiar with programming. Feel free to copy/use these scripts in your own projects.

← Back to projects list

Video Conversion Project - aka I hate my CPU

Intro