Newer
Older
AhG 11 reference software for NNVC

Karsten Suehring
committed
==============================
This software package is the reference software for AhG11. The reference software includes both encoder and decoder functionality.

Karsten Suehring
committed
Reference software is useful in aiding users of a video coding standard to establish and test conformance and interoperability, and to educate users and demonstrate the capabilities of the standard. For these purposes, this software is provided as an aid for the study and implementation of Versatile Video Coding.
The software has been jointly developed by the ITU-T Video Coding Experts Group (VCEG, Question 6 of ITU-T Study Group 16) and the ISO/IEC Moving Picture Experts Group (MPEG, Working Group 11 of Subcommittee 29 of ISO/IEC Joint Technical Committee 1).
A software manual, which contains usage instructions, can be found in the "doc" subdirectory of this software package.
Content
==================
The package contains the following components:
- a base codec based on VTM-11.0_nnvc (VTM-11.0 + JVET-V0056 patch)
- a data dumping feature at the decoder, activated with the macro NNVC\_DUMP\_DATA in TypeDef.h. Encoder log is also slightly changed in order to log some useful information
- a data loading feature in python, compatible with the data dumped by the decoder
- an inference feature based on SADL supporting both float and int16 model
- a common api supporting input preparation and inference of NN-based models
- two NN-based loop filter: default is the Low Operating Point (LOP). The second one is High Operating Point (HOP), available using the nn_based/HOP.cfg configuration file.
- training scripts of the NN-based loop filter.

Karsten Suehring
committed
Build instructions
==================
Submodule
----------------------------------------------
The SADL sub module first need to be initialized the first time using, using the following commands:
```bash
git submodule init
git submodule update
```

Karsten Suehring
committed
The CMake tool is used to create platform-specific build files.
Although CMake may be able to generate 32-bit binaries, **it is generally suggested to build 64-bit binaries**. 32-bit binaries are not able to access more than 2GB of RAM, which will not be sufficient for coding larger image formats. Building in 32-bit environments is not tested and will not be supported.
Build instructions for plain CMake (suggested)
----------------------------------------------
**Note:** A working CMake installation is required for building the software.
CMake generates configuration files for the compiler environment/development environment on each platform.
The following is a list of examples for Windows (MS Visual Studio), macOS (Xcode) and Linux (make).
Open a command prompt on your system and change into the root directory of this project.
Create a build directory in the root directory:
```bash
mkdir build
```
Use one of the following CMake commands, based on your platform. Feel free to change the commands to satisfy
your needs.
**Windows Visual Studio 2015/17/19 64 Bit:**
Use the proper generator string for generating Visual Studio files, e.g. for VS 2015:
```bash
cd build
cmake .. -G "Visual Studio 14 2015 Win64"
```
Then open the generated solution file in MS Visual Studio.
For VS 2017 use "Visual Studio 15 2017 Win64", for VS 2019 use "Visual Studio 16 2019".
Visual Studio 2019 also allows you to open the CMake directory directly. Choose "File->Open->CMake" for this option.
**macOS Xcode:**
```bash
cd build
cmake .. -G "Xcode"
```
Then open the generated work space in Xcode.
For generating Makefiles with optional non-default compilers, use the following commands:
```bash
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9
```
In this example the brew installed GCC 9 is used for a release build.
**Linux**
For generating Linux Release Makefile:
```bash
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
```
For generating Linux Debug Makefile:
```bash
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
```
When built in Unix-like environments, AVX2 is enabled by default for NN related code. Use the following commands to enable AVX512.
cmake .. -DCMAKE_BUILD_TYPE=Release -DNNLF_BUILD_WITH_AVX512=1
Then type
```bash
make -j
```
For more details, refer to the CMake documentation: https://cmake.org/cmake/help/latest/
Build instructions for make
---------------------------
**Note:** The build instructions in this section require the make tool and Python to be installed, which are
part of usual Linux and macOS environments. See below for installation instruction for Python and GnuWin32
on Windows.
Open a command prompt on your system and change into the root directory of this project.
To use the default system compiler simply call:
```bash
make all
```

Karsten Suehring
committed
**MSYS2 and MinGW (Windows)**
**Note:** Build files for MSYS MinGW were added on request. The build platform is not regularily tested and can't be supported.
Open an MSYS MinGW 64-Bit terminal and change into the root directory of this project.
Call:
```bash
make all toolset=gcc
```

Karsten Suehring
committed
The following tools need to be installed for MSYS2 and MinGW:
Download CMake: http://www.cmake.org/ and install it.
Python and GnuWin32 are not mandatory, but they simplify the build process for the user.
python: https://www.python.org/downloads/release/python-371/
gnuwin32: https://sourceforge.net/projects/getgnuwin32/files/getgnuwin32/0.6.30/GetGnuWin32-0.6.3.exe/download
To use MinGW, install MSYS2: http://repo.msys2.org/distrib/msys2-x86_64-latest.exe
Installation instructions: https://www.msys2.org/
Install the needed toolchains:
```bash
pacman -S --needed base-devel mingw-w64-i686-toolchain mingw-w64-x86_64-toolchain git subversion mingw-w64-i686-cmake mingw-w64-x86_64-cmake
```
Data dumping
==================
The dumping of a dataset for a particular bitstream is obtained using:
```bash
DecoderAppStatic -b D_BlowingBubbles_416x240_50Hz_8bit_P420_37.bin --DumpBasename D_BlowingBubbles_416x240_50Hz_8bit_P420_37
```
The data to be dumped are controlled by macros in the file TypeDef.h (see below).
It will produce several data files and a json description of the dump.
We assume an encoder log file is also associated with each produced bitstream. The encoder log files should have necessary information (using the --Verbosity=6). The link between the decoder dump and the encoder configuration is done using the bitstream name.
The provided scripts will automatically create a json file containing all information (dumped data, original YUV, etc.) providing the correct directories.
Data format
----------------------------------------------
The data dumper use the following format for each component.
Note: a set of macros allows to enable/disable the data to be dumped/used.
- basename\_rec\_before\_dbf.yuv: reconstruction before deblocking filter (macro NNVC\_USE\_REC\_BEFORE\_DBF): YUV format, 10bits
- basename\_rec\_after\_dbf.yuv: reconstruction after deblocking filter (macro NNVC\_USE\_REC\_AFTER\_DBF): YUV format, 10bits
- basename\_pred.yuv: prediction (macro NNVC\_USE\_PRED): YUV format, 10bits
- basename\_bs.yuv: boundaries strength (macro NNVC\_USE\_BS): YUV format, 10bits. Boundaries strength of the deblocking filter, normalize on 10 bits
- basename\_partition\_cu\_average.yuv: partition information as cu average (macro NNVC\_USE\_PARTITION\_AS\_CU\_AVERAGE): YUV format, 10 bits. Contains the average of each CU.
- basename\_qp.dat: QP slice (macro NNVC\_USE\_QP): int32, binary format
- basename\_slicetype.dat: slice type (macro NNVC\_USE\_SLICETYPE): int32, binary format
Json file
----------------------------------------------
During data dump, a json file is produced for each bitstream, with the name basename.json.
The format of the json file is as follows:
```json
{
"suffix_rec_before_dbf": "_rec_before_dbf.yuv",
"suffix_pred": "_pred.yuv",
"suffix_partition_cu_average": "_partition_cu_average.yuv",
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
"suffix_bs": "_bs.yuv",
"suffix_qp": "_qp.dat",
"suffix_slicetype": "_slicetype.dat",
[
{
"qp_base" : 32,
"bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_37.bin",
"basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_37",
"width": 416,
"height": 240,
"data_count": 65
}
]
}
```
If a particular data is present, the corresponding suffix variable is defined.
An array of one element is described:
- the widht and height of the frames
- the number of frames dumped (data\_count)
- the concatenation of a suffix and basename gives the full filename for this data
- bsname: help to find the correspondance between the dataset and the original encoded sequence
- qp\_base: QP base to encode the sequence
Build the database
----------------------------------------------
The database building is done in 2 passes.
First all individual dataset files are concatenated into one dataset file:
```bash
training/tools/concatenate_dataset.py --input_dir_json dir1 --input_dir_json dir2 --output_json pre_dataset.json
```
It will give a file with all individual json files concatenated:
```json
{
"suffix_rec_before_dbf": "_rec_before_dbf.yuv",
"suffix_pred": "_pred.yuv",
"suffix_partition_cu_average": "_partition_cu_average.yuv",
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
"suffix_bs": "_bs.yuv",
"suffix_qp": "_qp.dat",
"suffix_slicetype": "_slicetype.dat",
"data": [
{
"bsname": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2.bin",
"qp_base": 37,
"basename": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2",
"width": 416,
"height": 240,
"data_count": 33,
"dirname": "db"
},
{
"bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1.bin",
"qp_base": 32,
"basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1",
"width": 416,
"height": 240,
"data_count": 65,
"dirname": "db"
},
...]
}
```
For each dataset, the directory _dirname_ is added to the data field.
The file is then consolidated with information from the encoder log or the encoder configuration file used to encode the sequence in order to extract the YUV original file and other information:
```bash
training/tools/concatenate_dataset.py --input_json pre_dataset.json --input_dir_encoder direnc1 --input_dir_encoder direnc2 --output_json dataset.json --log_extension log
```
Note: To use the encoder configuration files, use the the correction extension (for example --log_extension cfg).
It results in a file with all information needed by the data loader.
```json
{
"suffix_rec_before_dbf": "_rec_before_dbf.yuv",
"suffix_pred": "_pred.yuv",
"suffix_partition_cu_average": "_partition_cu_average.yuv",
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
"suffix_bs": "_bs.yuv",
"suffix_qp": "_qp.dat",
"suffix_slicetype": "_slicetype.dat",
"data": [
{
"bsname": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2.bin",
"qp_base": 37,
"basename": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2",
"width": 416,
"height": 240,
"data_count": 33,
"dirname": "db",
"original_yuv": "/home/library01/VCL_VIDEO/JVET/Linux/ClassD/D_RaceHorses_416x240_30Hz_8bit_P420.yuv",
"original_temporal_subsample": 1,
"original_frame_skip": 32,
"original_bitdepth": 8
},
{
"bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1.bin",
"qp_base": 32,
"basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1",
"width": 416,
"height": 240,
"data_count": 65,
"dirname": "db",
"original_yuv": "/home/library01/VCL_VIDEO/JVET/Linux/ClassD/D_BlowingBubbles_416x240_50Hz_8bit_P420.yuv",
"original_temporal_subsample": 1,
"original_frame_skip": 0,
"original_bitdepth": 8
},
...
]
}
```
The following fields are added to each data:
- original\_yuv: location of the original yuv file
- original\_bitdepth: bitdepth of the original yuv file
- original\_frame\_skip: frame to skip in original yuv file (offset from the beginning to find the data)
- original\_temporal\_subsample: correspond to TemporalSubsampleRatio in VTM (used for AI configuration)
Note: the last 2 fields are present in the encoder logs only if the NNVC encoder has been used to generate the sequence. If not present, it is assume frame\_skip=0 and temporal\_subsample=1.
Data loading
==================
The module data_loader shows an example of data loading. The basic usage is as follows (see also the example program create\_unified\_dataset.py for an example of usage):
```python
import data_loader
patch_size=128
poc_list=0
# If `generate_type` is equal to 0, the data loading
# has some specificities, e.g. normalization, for NN-based loop filter set 0.
# has some specificities, e.g. normalization, for NN-based loop filter set 1.
border_size=8
dl=data_loader.DataLoader(input_json,
patch_size,
poc_list,
generate_type=generate_type)
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
print("Nb samples available: {}".format(dl.nb_patches()))
print("Available components: {}".format(dl.components))
# list of components to extract
comps=["org_Y", "pred_Y", "qp_base"]
p = dl.getPatchData(42,comps,border_size)
# p now contains the 42th patches in the database
```
On the fly loading
----------------------------------------------
Call getPatchData on random index to create a batch of data.
Pro: do not use more disk space because data is created on the fly from original dataset.
Cons: slower to generate a batch. The loading function should be put in an async funciton to avoid I/O latency.
Offline loading
----------------------------------------------
One can use the create\_unified\_dataset.py program to create a large dataset of already prepared patches.
Pro: faster: only open one file and perform random reading inside.
Cons: consume more disk space (float patches, duplication of the data).
Full example of data dumping and data loading
==================
The script training/example/sample_test.sh shows a full example of data dumping, dataset creation and data loading.
The loaded data are put into one binary file with all the patches inside. This file can be used to fast I/O during training.
Finally, a sample program allows to visualize the resulting dumped patches.
Common api
Yue Li
committed
==================
source/Lib/CommonLib/NNInference is a common api for unifiying input preparation and inference of NN models.
The common api supports preparing input from any dumped information such as reconstruction, prediction, partitioning, boundary strength, base QP, slice QP, and slice type.
Currently, input preparation and inference of the two NN-based loop filter sets are based on the common API.
Yue Li
committed
NN-based loop filter
==================
LOP NN loop-filter
------------------
There are several NN-based loop filter sets in the common software.
By default the LOP NN based loop-filter is activated. The default path is set assuming the encoder and decoder are run from the root of the repository.
When running the codec from other location, please adapt the NnlfModelName parameter to point to the correct location.
HOP NN loop-filter
------------------
In order to use the HOP loop-filter, just change the NnlfModelName parameter to point to the HOP model (see cfg\/nn\_based/HOP.cfg file for an example).
Legacy NN-based loop filter
==================
NN-based loop filter set 0
Yue Li
committed
----------------------------------------------
To activate NN-based loop filter set 0, use --NnlfOption=10, or equivalently -c cfg/nn-based/NnlfOption\_1.cfg
To specify a model path, use e.g. --ModelPath="models/", or equivalently -mp "models/". Note that model path should be specified at both encoder and decoder.
NN-based loop filter set 0
----------------------------------------------
To activate NN-based loop filter set 0, use --NnlfOption=10, or equivalently -c cfg/nn-based/NnlfOption\_1.cfg
Yue Li
committed
To specify a model path, use e.g. --ModelPath="models/", or equivalently -mp "models/". Note that model path should be specified at both encoder and decoder.
NN-based loop filter set 1
Yue Li
committed
----------------------------------------------
To activate NN-based loop filter set 1, use --NnlfOption=11, or equivalently -c cfg/nn-based/NnlfOption\_2.cfg
Yue Li
committed
To specify model paths, use e.g. following command lines. Note that model paths should be specified at both encoder and decoder.
--NnlfSet1InterLumaModel="models/NnlfSet1_LumaCNNFilter_InterSlice_int16.sadl"
--NnlfSet1InterChromaModel="models/NnlfSet1_ChromaCNNFilter_InterSlice_int16.sadl"
--NnlfSet1IntraLumaModel="models/NnlfSet1_LumaCNNFilter_IntraSlice_int16.sadl"
--NnlfSet1IntraChromaModel="models/NnlfSet1_ChromaCNNFilter_IntraSlice_int16.sadl"
LC NN-based loop filter set
----------------------------------------------
To activate LC(low complexity) NN-based loop filter set, use --NnlfOption=12, or equivalently -c cfg/nn-based/NnlfOption\_3.cfg
The LC NNLF consists of the CP Decomposed & Fused (CPDF) LC models proposed in JVET-AD0156.
The LC NNLF models are realized in SADL float and int16 frameworks. This can be set with the help of the macro ``NN_FIXED_POINT_IMPLEMENTATION`` in the file ``source/Lib/CommonLib/TypeDef.h``. The macro needs to be set to either 0 or 1 for float or int16 respectively.
To specify LC model paths, refer to following examples in command line. Note that model paths should be specified at both encoder and decoder. The path needs to be set based on the framework that the software is built for.
SADL float: ``--LCModelPath=models/NnlfSetLC/LC_float_model0.sadl,models/NnlfSetLC/LC_float_model1.sadl,models/NnlfSetLC/LC_float_model2.sadl,models/NnlfSetLC/LC_float_model3.sadl``
SADL int16: ``--LCModelPath=models/NnlfSetLC/LC_int16_model0.sadl,models/NnlfSetLC/LC_int16_model1.sadl,models/NnlfSetLC/LC_int16_model2.sadl,models/NnlfSetLC/LC_int16_model3.sadl``
The training and model conversion to SADL details can be found at ``training/training_scripts/Nn_Filtering_Set_LC/README.pdf``
NN-based loop filter encoder optimization
----------------------------------------------
To activate NN-based loop filter encoder optimization, use --EncNnlfOpt=1. Note that the encoder optimization is disabled by default.
To specify model paths, use e.g. following command lines. Note that model paths should be specified at encoder.
--RdoCnnlfInterLumaModel="models/RdNnlfSet1_LumaCNNFilter_InterSlice_int16.sadl"
--RdoCnnlfIntraLumaModel="models/RdNnlfSet1_LumaCNNFilter_IntraSlice_int16.sadl"
Thierry Dumas
committed
Low-complexity version of the neural network-based intra prediction mode
------------------------------------------------------------------------
Thierry Dumas
committed
When building VTM-11-NNVC, if the macro `JVET_AB0149_INTRA_PRED` is equal to 1 (default), the low-complexity version of the NN-based
Thierry Dumas
committed
intra prediction mode can be controlled using `--NnIntraPred`. By default, the encoder and decoder parameter `PrefixAbsolutePathsToGraphsOutput`
is equal to "models/intra" as it is assumed that the VTM-11-NNVC encoder/decoder executable is run from the root directory of the
project. If the VTM-11-NNVC encoder/decoder executable is run from a directory different from the root directory of the project,
please add the following argument when running the VTM-11-NNVC encoder/decoder executable: `--PrefixAbsolutePathsToGraphsOutput=path_to_directory_models_intra`,
Thierry Dumas
committed
where `path_to_directory_models_intra` is the path to the directory "models/intra" relatively to the directory from which the
VTM-11-NNVC encoder/decoder executable is run.
Thierry Dumas
committed
Renjie Chang
committed
NN-based super resolution
==================
Renjie Chang
committed
To activate NN-based super resolution, use --NnsrOption=1. The default model path is set as "./models/super_resolution/".
For rate matchinng, use following config file when testing class A1 or A2.
[cfg/nn-based/nnsr_classA1.cfg](cfg/nn-based/nnsr_classA1.cfg)
[cfg/nn-based/nnsr_classA2.cfg](cfg/nn-based/nnsr_classA2.cfg)
Content-adaptive post-filter
==================
To activate the content-adaptive post-filter use the corresponding sequence-QP configuration file under the following directory:
[cfg/nn-based/nnpf/](cfg/nn-based/nnpf/)
Each config files describes the post-filter as well as:
* NNR bitstream, weight-update, with the parameter: `SEINNPFCPayloadFilename1`.
* The over-fitted SADL model with the parameter: `NnpfModelPath`. This parameter must be used also when running the decoder.
The NNPFC SEI messages are to be signalled **once** for the whole video sequence (poc = 0).
The bitrate of the SEI messages is computed in the same fashion as the video bitstream. Therefore,
if split encoding is considered, the overhead of the SEI messages needs to be readjusted to consider the
length of the whole video sequence (value reported in the encoder log).
Finally, the post-filtered reconstruction can be saved at the decoder side by adding the parameters
`--ReconFile=reco.yuv --NnpfReconFile=recoPost.yuv`. **Note** Both parameters must be provided.