Skip to content
Snippets Groups Projects
Forked from jvet / VVCSoftware_VTM
2522 commits behind, 472 commits ahead of the upstream repository.
Franck Galpin's avatar
Franck Galpin authored
JVET-AI0157: Content-adaptive VLOP models

See merge request jvet-ahg-nnvc/VVCSoftware_VTM!248
6d92d75a
History

AhG 11 reference software for NNVC

This software package is the reference software for AhG11. The reference software includes both encoder and decoder functionality.

Reference software is useful in aiding users of a video coding standard to establish and test conformance and interoperability, and to educate users and demonstrate the capabilities of the standard. For these purposes, this software is provided as an aid for the study and implementation of Versatile Video Coding.

The software has been jointly developed by the ITU-T Video Coding Experts Group (VCEG, Question 6 of ITU-T Study Group 16) and the ISO/IEC Moving Picture Experts Group (MPEG, Working Group 11 of Subcommittee 29 of ISO/IEC Joint Technical Committee 1).

A software manual, which contains usage instructions, can be found in the "doc" subdirectory of this software package.

Content

The package contains the following components:

  • a base codec based on VTM-11.0_nnvc (VTM-11.0 + JVET-V0056 patch)
  • a data dumping feature at the decoder, activated with the macro NNVC_DUMP_DATA in TypeDef.h. Encoder log is also slightly changed in order to log some useful information
  • a data loading feature in python, compatible with the data dumped by the decoder
  • an inference feature based on SADL supporting both float and int16 model
  • a common api supporting input preparation and inference of NN-based models
  • two NN-based loop filter: default is the Low Operating Point (LOP). The second one is High Operating Point (HOP), available using the nn_based/HOP.cfg configuration file.
  • training scripts of the NN-based loop filter.

Build instructions

Submodule

The SADL sub module first need to be initialized the first time using, using the following commands:

git submodule init
git submodule update

The CMake tool is used to create platform-specific build files.

Although CMake may be able to generate 32-bit binaries, it is generally suggested to build 64-bit binaries. 32-bit binaries are not able to access more than 2GB of RAM, which will not be sufficient for coding larger image formats. Building in 32-bit environments is not tested and will not be supported.

Build instructions for plain CMake (suggested)

Note: A working CMake installation is required for building the software.

CMake generates configuration files for the compiler environment/development environment on each platform. The following is a list of examples for Windows (MS Visual Studio), macOS (Xcode) and Linux (make).

Open a command prompt on your system and change into the root directory of this project.

Create a build directory in the root directory:

mkdir build 

Use one of the following CMake commands, based on your platform. Feel free to change the commands to satisfy your needs.

Windows Visual Studio 2015/17/19 64 Bit:

Use the proper generator string for generating Visual Studio files, e.g. for VS 2015:

cd build
cmake .. -G "Visual Studio 14 2015 Win64"

Then open the generated solution file in MS Visual Studio.

For VS 2017 use "Visual Studio 15 2017 Win64", for VS 2019 use "Visual Studio 16 2019".

Visual Studio 2019 also allows you to open the CMake directory directly. Choose "File->Open->CMake" for this option.

macOS Xcode:

For generating an Xcode workspace type:

cd build
cmake .. -G "Xcode"

Then open the generated work space in Xcode.

For generating Makefiles with optional non-default compilers, use the following commands:

cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9

In this example the brew installed GCC 9 is used for a release build.

Linux

For generating Linux Release Makefile:

cd build
cmake .. -DCMAKE_BUILD_TYPE=Release

For generating Linux Debug Makefile:

cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug

When built in Unix-like environments, AVX2 is enabled by default for NN related code. Use the following commands to enable AVX512.

cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DNNLF_BUILD_WITH_AVX512=1

Then type

make -j

For more details, refer to the CMake documentation: https://cmake.org/cmake/help/latest/

Build instructions for make

Note: The build instructions in this section require the make tool and Python to be installed, which are part of usual Linux and macOS environments. See below for installation instruction for Python and GnuWin32 on Windows.

Open a command prompt on your system and change into the root directory of this project.

To use the default system compiler simply call:

make all

MSYS2 and MinGW (Windows)

Note: Build files for MSYS MinGW were added on request. The build platform is not regularily tested and can't be supported.

Open an MSYS MinGW 64-Bit terminal and change into the root directory of this project.

Call:

make all toolset=gcc

The following tools need to be installed for MSYS2 and MinGW:

Download CMake: http://www.cmake.org/ and install it.

Python and GnuWin32 are not mandatory, but they simplify the build process for the user.

python: https://www.python.org/downloads/release/python-371/

gnuwin32: https://sourceforge.net/projects/getgnuwin32/files/getgnuwin32/0.6.30/GetGnuWin32-0.6.3.exe/download

To use MinGW, install MSYS2: http://repo.msys2.org/distrib/msys2-x86_64-latest.exe

Installation instructions: https://www.msys2.org/

Install the needed toolchains:

pacman -S --needed base-devel mingw-w64-i686-toolchain mingw-w64-x86_64-toolchain git subversion mingw-w64-i686-cmake mingw-w64-x86_64-cmake

Data dumping

The dumping of a dataset for a particular bitstream is obtained using:

DecoderAppStatic -b D_BlowingBubbles_416x240_50Hz_8bit_P420_37.bin --DumpBasename D_BlowingBubbles_416x240_50Hz_8bit_P420_37

The data to be dumped are controlled by macros in the file TypeDef.h (see below). It will produce several data files and a json description of the dump. We assume an encoder log file is also associated with each produced bitstream. The encoder log files should have necessary information (using the --Verbosity=6). The link between the decoder dump and the encoder configuration is done using the bitstream name. The provided scripts will automatically create a json file containing all information (dumped data, original YUV, etc.) providing the correct directories.

Data format

The data dumper use the following format for each component. Note: a set of macros allows to enable/disable the data to be dumped/used.

  • basename_rec_before_dbf.yuv: reconstruction before deblocking filter (macro NNVC_USE_REC_BEFORE_DBF): YUV format, 10bits
  • basename_rec_after_dbf.yuv: reconstruction after deblocking filter (macro NNVC_USE_REC_AFTER_DBF): YUV format, 10bits
  • basename_pred.yuv: prediction (macro NNVC_USE_PRED): YUV format, 10bits
  • basename_bs.yuv: boundaries strength (macro NNVC_USE_BS): YUV format, 10bits. Boundaries strength of the deblocking filter, normalize on 10 bits
  • basename_partition_cu_average.yuv: partition information as cu average (macro NNVC_USE_PARTITION_AS_CU_AVERAGE): YUV format, 10 bits. Contains the average of each CU.
  • basename_qp.dat: QP slice (macro NNVC_USE_QP): int32, binary format
  • basename_slicetype.dat: slice type (macro NNVC_USE_SLICETYPE): int32, binary format

Json file

During data dump, a json file is produced for each bitstream, with the name basename.json. The format of the json file is as follows:

{
 "suffix_rec_before_dbf": "_rec_before_dbf.yuv", 
 "suffix_pred": "_pred.yuv",
 "suffix_partition_cu_average": "_partition_cu_average.yuv",
 "suffix_bs": "_bs.yuv",
 "suffix_qp": "_qp.dat",
 "suffix_slicetype": "_slicetype.dat",
 [
  {
    "qp_base" : 32,
    "bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_37.bin",
    "basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_37",
    "width": 416,
    "height": 240,
    "data_count": 65
  }
 ]
}

If a particular data is present, the corresponding suffix variable is defined. An array of one element is described:

  • the widht and height of the frames
  • the number of frames dumped (data_count)
  • the concatenation of a suffix and basename gives the full filename for this data
  • bsname: help to find the correspondance between the dataset and the original encoded sequence
  • qp_base: QP base to encode the sequence

Build the database

The database building is done in 2 passes.

First all individual dataset files are concatenated into one dataset file:

training/tools/concatenate_dataset.py --input_dir_json dir1 --input_dir_json dir2 --output_json pre_dataset.json

It will give a file with all individual json files concatenated:

{
 "suffix_rec_before_dbf": "_rec_before_dbf.yuv",
 "suffix_pred": "_pred.yuv",
 "suffix_partition_cu_average": "_partition_cu_average.yuv",
 "suffix_bs": "_bs.yuv",
 "suffix_qp": "_qp.dat",
 "suffix_slicetype": "_slicetype.dat",
 "data": [
  {
   "bsname": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2.bin",
   "qp_base": 37,
   "basename": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2",
   "width": 416,
   "height": 240,
   "data_count": 33,
   "dirname": "db"
  },
  {
   "bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1.bin",
   "qp_base": 32,
   "basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1",
   "width": 416,
   "height": 240,
   "data_count": 65,
   "dirname": "db"
  },
...]
}

For each dataset, the directory dirname is added to the data field.

The file is then consolidated with information from the encoder log or the encoder configuration file used to encode the sequence in order to extract the YUV original file and other information:

training/tools/concatenate_dataset.py --input_json pre_dataset.json --input_dir_encoder direnc1 --input_dir_encoder direnc2 --output_json dataset.json --log_extension log

Note: To use the encoder configuration files, use the the correction extension (for example --log_extension cfg).

It results in a file with all information needed by the data loader.

{
 "suffix_rec_before_dbf": "_rec_before_dbf.yuv",
 "suffix_pred": "_pred.yuv",
 "suffix_partition_cu_average": "_partition_cu_average.yuv",
 "suffix_bs": "_bs.yuv",
 "suffix_qp": "_qp.dat",
 "suffix_slicetype": "_slicetype.dat",
 "data": [
  {
   "bsname": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2.bin",
   "qp_base": 37,
   "basename": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2",
   "width": 416,
   "height": 240,
   "data_count": 33,
   "dirname": "db",
   "original_yuv": "/home/library01/VCL_VIDEO/JVET/Linux/ClassD/D_RaceHorses_416x240_30Hz_8bit_P420.yuv",
   "original_temporal_subsample": 1,
   "original_frame_skip": 32,
   "original_bitdepth": 8
  },
  {
   "bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1.bin",
   "qp_base": 32,
   "basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1",
   "width": 416,
   "height": 240,
   "data_count": 65,
   "dirname": "db",
   "original_yuv": "/home/library01/VCL_VIDEO/JVET/Linux/ClassD/D_BlowingBubbles_416x240_50Hz_8bit_P420.yuv",
   "original_temporal_subsample": 1,
   "original_frame_skip": 0,
   "original_bitdepth": 8
  },
  ...
  ]
}

The following fields are added to each data:

  • original_yuv: location of the original yuv file
  • original_bitdepth: bitdepth of the original yuv file
  • original_frame_skip: frame to skip in original yuv file (offset from the beginning to find the data)
  • original_temporal_subsample: correspond to TemporalSubsampleRatio in VTM (used for AI configuration)

Note: the last 2 fields are present in the encoder logs only if the NNVC encoder has been used to generate the sequence. If not present, it is assume frame_skip=0 and temporal_subsample=1.

Data loading

The module data_loader shows an example of data loading. The basic usage is as follows (see also the example program create_unified_dataset.py for an example of usage):

import data_loader

patch_size=128
poc_list=0

# If `generate_type` is equal to 0, the data loading
# has some specificities, e.g. normalization, for NN-based loop filter set 0.
# If `generate_type` is equal to 1, the data loading
# has some specificities, e.g. normalization, for NN-based loop filter set 1.
generate_type=1
border_size=8

dl=data_loader.DataLoader(input_json,
                          patch_size,
                          poc_list,
                          generate_type=generate_type)
print("Nb samples available: {}".format(dl.nb_patches()))
print("Available components: {}".format(dl.components))

# list of components to extract
comps=["org_Y", "pred_Y", "qp_base"]
p = dl.getPatchData(42,comps,border_size)
# p now contains the 42th patches in the database

On the fly loading

Call getPatchData on random index to create a batch of data.

Pro: do not use more disk space because data is created on the fly from original dataset.

Cons: slower to generate a batch. The loading function should be put in an async funciton to avoid I/O latency.

Offline loading

One can use the create_unified_dataset.py program to create a large dataset of already prepared patches.

Pro: faster: only open one file and perform random reading inside.

Cons: consume more disk space (float patches, duplication of the data).

Full example of data dumping and data loading

The script training/example/sample_test.sh shows a full example of data dumping, dataset creation and data loading. The loaded data are put into one binary file with all the patches inside. This file can be used to fast I/O during training. Finally, a sample program allows to visualize the resulting dumped patches.

Common api

source/Lib/CommonLib/NNInference is a common api for unifiying input preparation and inference of NN models. The common api supports preparing input from any dumped information such as reconstruction, prediction, partitioning, boundary strength, base QP, slice QP, and slice type. Currently, input preparation and inference of the two NN-based loop filter sets are based on the common API.

NN-based loop filter

LOP NN loop-filter

There are several NN-based loop filter sets in the common software. By default the LOP NN based loop-filter is activated. The default path is set assuming the encoder and decoder are run from the root of the repository. When running the codec from other location, please adapt the NnlfModelName parameter to point to the correct location.

HOP NN loop-filter

In order to use the HOP loop-filter, just change the NnlfModelName parameter to point to the HOP model (see cfg/nn_based/HOP.cfg file for an example).

Legacy NN-based loop filter

NN-based loop filter set 0

To activate NN-based loop filter set 0, use --NnlfOption=10, or equivalently -c cfg/nn-based/NnlfOption_1.cfg

To specify a model path, use e.g. --ModelPath="models/", or equivalently -mp "models/". Note that model path should be specified at both encoder and decoder.

NN-based loop filter set 0

To activate NN-based loop filter set 0, use --NnlfOption=10, or equivalently -c cfg/nn-based/NnlfOption_1.cfg

To specify a model path, use e.g. --ModelPath="models/", or equivalently -mp "models/". Note that model path should be specified at both encoder and decoder.

NN-based loop filter set 1

To activate NN-based loop filter set 1, use --NnlfOption=11, or equivalently -c cfg/nn-based/NnlfOption_2.cfg

To specify model paths, use e.g. following command lines. Note that model paths should be specified at both encoder and decoder.

--NnlfSet1InterLumaModel="models/NnlfSet1_LumaCNNFilter_InterSlice_int16.sadl" --NnlfSet1InterChromaModel="models/NnlfSet1_ChromaCNNFilter_InterSlice_int16.sadl" --NnlfSet1IntraLumaModel="models/NnlfSet1_LumaCNNFilter_IntraSlice_int16.sadl" --NnlfSet1IntraChromaModel="models/NnlfSet1_ChromaCNNFilter_IntraSlice_int16.sadl"

LC NN-based loop filter set

To activate LC(low complexity) NN-based loop filter set, use --NnlfOption=12, or equivalently -c cfg/nn-based/NnlfOption_3.cfg

The LC NNLF consists of the CP Decomposed & Fused (CPDF) LC models proposed in JVET-AD0156.

The LC NNLF models are realized in SADL float and int16 frameworks. This can be set with the help of the macro NN_FIXED_POINT_IMPLEMENTATION in the file source/Lib/CommonLib/TypeDef.h. The macro needs to be set to either 0 or 1 for float or int16 respectively.

To specify LC model paths, refer to following examples in command line. Note that model paths should be specified at both encoder and decoder. The path needs to be set based on the framework that the software is built for.

SADL float: --LCModelPath=models/NnlfSetLC/LC_float_model0.sadl,models/NnlfSetLC/LC_float_model1.sadl,models/NnlfSetLC/LC_float_model2.sadl,models/NnlfSetLC/LC_float_model3.sadl

SADL int16: --LCModelPath=models/NnlfSetLC/LC_int16_model0.sadl,models/NnlfSetLC/LC_int16_model1.sadl,models/NnlfSetLC/LC_int16_model2.sadl,models/NnlfSetLC/LC_int16_model3.sadl

The training and model conversion to SADL details can be found at training/training_scripts/Nn_Filtering_Set_LC/README.pdf

NN-based loop filter encoder optimization

To activate NN-based loop filter encoder optimization, use --EncNnlfOpt=1. Note that the encoder optimization is disabled by default.

To specify model paths, use e.g. following command lines. Note that model paths should be specified at encoder. --RdoCnnlfInterLumaModel="models/RdNnlfSet1_LumaCNNFilter_InterSlice_int16.sadl" --RdoCnnlfIntraLumaModel="models/RdNnlfSet1_LumaCNNFilter_IntraSlice_int16.sadl"

Low-complexity version of the neural network-based intra prediction mode

When building VTM-11-NNVC, if the macro JVET_AB0149_INTRA_PRED is equal to 1 (default), the low-complexity version of the NN-based intra prediction mode can be controlled using --NnIntraPred. By default, the encoder and decoder parameter PrefixAbsolutePathsToGraphsOutput is equal to "models/intra" as it is assumed that the VTM-11-NNVC encoder/decoder executable is run from the root directory of the project. If the VTM-11-NNVC encoder/decoder executable is run from a directory different from the root directory of the project, please add the following argument when running the VTM-11-NNVC encoder/decoder executable: --PrefixAbsolutePathsToGraphsOutput=path_to_directory_models_intra, where path_to_directory_models_intra is the path to the directory "models/intra" relatively to the directory from which the VTM-11-NNVC encoder/decoder executable is run.

NN-based super resolution

To activate NN-based super resolution, use --NnsrOption=1. The default model path is set as "./models/super_resolution/".

For rate matchinng, use following config file when testing class A1 or A2. cfg/nn-based/nnsr_classA1.cfg cfg/nn-based/nnsr_classA2.cfg

Content-adaptive post-filter

To activate the content-adaptive post-filter use the corresponding sequence-QP configuration file under the following directory: cfg/nn-based/nnpf/

Each config files describes the post-filter as well as:

  • NNR bitstream, weight-update, with the parameter: SEINNPFCPayloadFilename1.
  • The over-fitted SADL model with the parameter: NnpfModelPath. This parameter must be used also when running the decoder.

The NNPFC SEI messages are to be signalled once for the whole video sequence (poc = 0). The bitrate of the SEI messages is computed in the same fashion as the video bitstream. Therefore, if split encoding is considered, the overhead of the SEI messages needs to be readjusted to consider the length of the whole video sequence (value reported in the encoder log).

Finally, the post-filtered reconstruction can be saved at the decoder side by adding the parameters --ReconFile=reco.yuv --NnpfReconFile=recoPost.yuv. Note Both parameters must be provided.