When a video is broadcast or streamed, we don't send actual frames but encoded data describing how to synthesise each frame at the end device. For efficiency, we want to represent the video in as little information as possible, but that requires complex algorithms. These algorithms break up the images into small blocks. The size and number of these need to be optimal: not too big, so that we retain critical detail; not too small, so that we avoid redundant information.
The DTs were given lots of examples of coding units and told whether they were split up or not. Using this information, it can then form a tree of binary decisions, sorting the coding units into categories. Once the trees were 'trained' on known data, the algorithm could then estimate whether a new block of pixels that it had not seen before was likely to be split up or not, depending on its characteristics.
After training the models, criteria can be extracted from them in the form of very simple 'if' statements, for example, 'if X then do Y'. These were written into our open-source HEVC Turing codec, checking with the ML criteria before performing the long testing process, meaning that sometimes this could be skipped, saving time and energy.
We defined two novel metrics for a trade-off between accuracy and speed, allowing the DT tool to be configurable and applicable to more problems in the video coding field. Putting these rules 'learned' by the decision tree into the codec sped up the encoding process by over 40% on average with minimal difference to the video quality! More information about the method and results is in in September 2019.
Following these results, .
As we have seen, by video coding and machine learning working together, the encoding process can be carried out a lot faster while maintaining the same visual quality and data efficiency. Our proposed DT-based training algorithm can be reused for various encoder types and applications. It will adapt models according to carefully selected training data and enable quick optimisation choices for any given use - supporting our ultimate goal of bringing the audience higher quality and more immersive experiences.