Skip to content
Snippets Groups Projects
README.md 20.3 KiB
Newer Older
  • Learn to ignore specific revisions
  • AhG 11 reference software for NNVC
    
    This software package is the reference software for AhG11. The reference software includes both encoder and decoder functionality.
    
    
    Reference software is useful in aiding users of a video coding standard to establish and test conformance and interoperability, and to educate users and demonstrate the capabilities of the standard. For these purposes, this software is provided as an aid for the study and implementation of Versatile Video Coding.
    
    The software has been jointly developed by the ITU-T Video Coding Experts Group (VCEG, Question 6 of ITU-T Study Group 16) and the ISO/IEC Moving Picture Experts Group (MPEG, Working Group 11 of Subcommittee 29 of ISO/IEC Joint Technical Committee 1).
    
    A software manual, which contains usage instructions, can be found in the "doc" subdirectory of this software package.
    
    
    Content
    ==================
    The package contains the following components:
    - a base codec based on VTM-11.0_nnvc (VTM-11.0 + JVET-V0056 patch)
    
    - a data dumping feature at the decoder, activated with the macro NNVC\_DUMP\_DATA in TypeDef.h. Encoder log is also slightly changed in order to log some useful information
    
    - a data loading feature in python, compatible with the data dumped by the decoder
    - an inference feature based on SADL supporting both float and int16 model
    
    - a common api supporting input preparation and inference of NN-based models
    
    - two NN-based loop filter: default is the Low Operating Point (LOP). The second one is High Operating Point (HOP), available using the nn_based/HOP.cfg configuration file.
    - training scripts of the NN-based loop filter.
    
    Submodule 
    ----------------------------------------------
    The SADL sub module first need to be initialized the first time using, using the following commands:
    ```bash
    git submodule init
    git submodule update
    ```
    
    
    The CMake tool is used to create platform-specific build files. 
    
    Although CMake may be able to generate 32-bit binaries, **it is generally suggested to build 64-bit binaries**. 32-bit binaries are not able to access more than 2GB of RAM, which will not be sufficient for coding larger image formats. Building in 32-bit environments is not tested and will not be supported.
    
    
    
    Build instructions for plain CMake (suggested)
    ----------------------------------------------
    
    **Note:** A working CMake installation is required for building the software.
    
    CMake generates configuration files for the compiler environment/development environment on each platform. 
    The following is a list of examples for Windows (MS Visual Studio), macOS (Xcode) and Linux (make).
    
    Open a command prompt on your system and change into the root directory of this project.
    
    Create a build directory in the root directory:
    ```bash
    mkdir build 
    ```
    
    Use one of the following CMake commands, based on your platform. Feel free to change the commands to satisfy
    your needs.
    
    
    **Windows Visual Studio 2015/17/19 64 Bit:**
    
    Use the proper generator string for generating Visual Studio files, e.g. for VS 2015:
    
    
    ```bash
    cd build
    cmake .. -G "Visual Studio 14 2015 Win64"
    ```
    
    Then open the generated solution file in MS Visual Studio.
    
    
    For VS 2017 use "Visual Studio 15 2017 Win64", for VS 2019 use "Visual Studio 16 2019".
    
    Visual Studio 2019 also allows you to open the CMake directory directly. Choose "File->Open->CMake" for this option.
    
    
    
    For generating an Xcode workspace type:
    
    ```bash
    cd build
    cmake .. -G "Xcode"
    ```
    Then open the generated work space in Xcode.
    
    
    For generating Makefiles with optional non-default compilers, use the following commands:
    
    ```bash
    cd build
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9
    ```
    In this example the brew installed GCC 9 is used for a release build.
    
    
    **Linux**
    
    For generating Linux Release Makefile:
    ```bash
    cd build
    cmake .. -DCMAKE_BUILD_TYPE=Release
    ```
    For generating Linux Debug Makefile:
    ```bash
    cd build
    cmake .. -DCMAKE_BUILD_TYPE=Debug
    ```
    
    
    When built in Unix-like environments, AVX2 is enabled by default for NN related code. Use the following commands to enable AVX512.
    
    cmake .. -DCMAKE_BUILD_TYPE=Release -DNNLF_BUILD_WITH_AVX512=1
    
    Then type
    ```bash
    make -j
    ```
    
    For more details, refer to the CMake documentation: https://cmake.org/cmake/help/latest/
    
    Build instructions for make
    ---------------------------
    
    **Note:** The build instructions in this section require the make tool and Python to be installed, which are
    part of usual Linux and macOS environments. See below for installation instruction for Python and GnuWin32 
    on Windows.
    
    Open a command prompt on your system and change into the root directory of this project.
    
    To use the default system compiler simply call:
    ```bash
    make all
    ```
    
    
    
    **MSYS2 and MinGW (Windows)**
    
    **Note:** Build files for MSYS MinGW were added on request. The build platform is not regularily tested and can't be supported. 
    
    Open an MSYS MinGW 64-Bit terminal and change into the root directory of this project.
    
    The following tools need to be installed for MSYS2 and MinGW:
    
    
    Download CMake: http://www.cmake.org/ and install it.
    
    Python and GnuWin32 are not mandatory, but they simplify the build process for the user.
    
    python:    https://www.python.org/downloads/release/python-371/
    
    gnuwin32:  https://sourceforge.net/projects/getgnuwin32/files/getgnuwin32/0.6.30/GetGnuWin32-0.6.3.exe/download
    
    To use MinGW, install MSYS2: http://repo.msys2.org/distrib/msys2-x86_64-latest.exe
    
    Installation instructions: https://www.msys2.org/
    
    Install the needed toolchains:
    ```bash
    pacman -S --needed base-devel mingw-w64-i686-toolchain mingw-w64-x86_64-toolchain git subversion mingw-w64-i686-cmake mingw-w64-x86_64-cmake
    ```
    
    
    
    Data dumping
    ==================
    
    The dumping of a dataset for a particular bitstream is obtained using:
    ```bash
    DecoderAppStatic -b D_BlowingBubbles_416x240_50Hz_8bit_P420_37.bin --DumpBasename D_BlowingBubbles_416x240_50Hz_8bit_P420_37
    ```
    
    The data to be dumped are controlled by macros in the file TypeDef.h (see below).
    
    It will produce several data files and a json description of the dump.
    We assume an encoder log file is also associated with each produced bitstream. The encoder log files should have necessary information (using the --Verbosity=6). The link between the decoder dump and the encoder configuration is done using the bitstream name. 
    The provided scripts will automatically create a json file containing all information (dumped data, original YUV, etc.) providing the correct directories.
    
    Data format
    ----------------------------------------------
    The data dumper use the following format for each component. 
    Note: a set of macros allows to enable/disable the data to be dumped/used.
    
    - basename\_rec\_before\_dbf.yuv:  reconstruction before deblocking filter (macro NNVC\_USE\_REC\_BEFORE\_DBF): YUV format, 10bits
    - basename\_rec\_after\_dbf.yuv:  reconstruction after deblocking filter (macro NNVC\_USE\_REC\_AFTER\_DBF): YUV format, 10bits
    - basename\_pred.yuv: prediction (macro NNVC\_USE\_PRED): YUV format, 10bits
    - basename\_bs.yuv: boundaries strength (macro NNVC\_USE\_BS): YUV format, 10bits. Boundaries strength of the deblocking filter, normalize on 10 bits
    
    - basename\_partition\_cu\_average.yuv: partition information as cu average (macro NNVC\_USE\_PARTITION\_AS\_CU\_AVERAGE): YUV format, 10 bits. Contains the average of each CU.
    
    - basename\_qp.dat: QP slice (macro NNVC\_USE\_QP): int32, binary format 
    - basename\_slicetype.dat: slice type (macro NNVC\_USE\_SLICETYPE): int32, binary format 
    
    
    
    Json file 
    ----------------------------------------------
    During data dump, a json file is produced for each bitstream, with the name basename.json.
    The format of the json file is as follows:
    ```json
    {
    
     "suffix_rec_before_dbf": "_rec_before_dbf.yuv", 
    
     "suffix_partition_cu_average": "_partition_cu_average.yuv",
    
     "suffix_bs": "_bs.yuv",
     "suffix_qp": "_qp.dat",
     "suffix_slicetype": "_slicetype.dat",
     [
      {
        "qp_base" : 32,
        "bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_37.bin",
        "basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_37",
        "width": 416,
        "height": 240,
        "data_count": 65
      }
     ]
    }
    ```
    If a particular data is present, the corresponding suffix variable is defined.
    An array of one element is described:
    - the widht and height of the frames
    - the number of frames dumped (data\_count)
    - the concatenation of a suffix and basename gives the full filename for this data
    - bsname: help to find the correspondance between the dataset and the original encoded sequence
    - qp\_base: QP base to encode the sequence
    
    
    Build the database 
    ----------------------------------------------
    The database building is done in 2 passes.
    
    First all individual dataset files are concatenated into one dataset file:
    ```bash
    training/tools/concatenate_dataset.py --input_dir_json dir1 --input_dir_json dir2 --output_json pre_dataset.json
    ```
    It will give a file with all individual json files concatenated:
    ```json
    {
    
     "suffix_rec_before_dbf": "_rec_before_dbf.yuv",
    
     "suffix_partition_cu_average": "_partition_cu_average.yuv",
    
     "suffix_bs": "_bs.yuv",
     "suffix_qp": "_qp.dat",
     "suffix_slicetype": "_slicetype.dat",
     "data": [
      {
       "bsname": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2.bin",
       "qp_base": 37,
       "basename": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2",
       "width": 416,
       "height": 240,
       "data_count": 33,
       "dirname": "db"
      },
      {
       "bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1.bin",
       "qp_base": 32,
       "basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1",
       "width": 416,
       "height": 240,
       "data_count": 65,
       "dirname": "db"
      },
    ...]
    }
    ```
    For each dataset, the directory _dirname_ is added to the data field.
    
    
    
    The file is then consolidated with information from the encoder log or the encoder configuration file used to encode the sequence in order to extract the YUV original file and other information:
    ```bash
    training/tools/concatenate_dataset.py --input_json pre_dataset.json --input_dir_encoder direnc1 --input_dir_encoder direnc2 --output_json dataset.json --log_extension log
    ```
    Note: To use the encoder configuration files, use the the correction extension (for example --log_extension cfg).
    
    It results in a file with all information needed by the data loader.
    ```json
    {
    
     "suffix_rec_before_dbf": "_rec_before_dbf.yuv",
    
     "suffix_partition_cu_average": "_partition_cu_average.yuv",
    
     "suffix_bs": "_bs.yuv",
     "suffix_qp": "_qp.dat",
     "suffix_slicetype": "_slicetype.dat",
     "data": [
      {
       "bsname": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2.bin",
       "qp_base": 37,
       "basename": "D_RaceHorses_416x240_30Hz_8bit_P420_37_P2",
       "width": 416,
       "height": 240,
       "data_count": 33,
       "dirname": "db",
       "original_yuv": "/home/library01/VCL_VIDEO/JVET/Linux/ClassD/D_RaceHorses_416x240_30Hz_8bit_P420.yuv",
       "original_temporal_subsample": 1,
       "original_frame_skip": 32,
       "original_bitdepth": 8
      },
      {
       "bsname": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1.bin",
       "qp_base": 32,
       "basename": "D_BlowingBubbles_416x240_50Hz_8bit_P420_32_P1",
       "width": 416,
       "height": 240,
       "data_count": 65,
       "dirname": "db",
       "original_yuv": "/home/library01/VCL_VIDEO/JVET/Linux/ClassD/D_BlowingBubbles_416x240_50Hz_8bit_P420.yuv",
       "original_temporal_subsample": 1,
       "original_frame_skip": 0,
       "original_bitdepth": 8
      },
      ...
      ]
    }
    ```
    The following fields are added to each data:
    - original\_yuv: location of the original yuv file
    - original\_bitdepth: bitdepth of the original yuv file
    - original\_frame\_skip: frame to skip in original yuv file (offset from the beginning to find the data)
    - original\_temporal\_subsample: correspond to TemporalSubsampleRatio in VTM (used for AI configuration)
    
    Note: the last 2 fields are present in the encoder logs only if the NNVC encoder has been used to generate the sequence. If not present, it is assume frame\_skip=0 and temporal\_subsample=1.
    
    
    Data loading
    ==================
    The module data_loader shows an example of data loading. The basic usage is as follows (see also the example program create\_unified\_dataset.py for an example of usage): 
    ```python
    import data_loader
    
    dumast's avatar
    dumast committed
    
    
    dumast's avatar
    dumast committed
    poc_list=0
    
    # If `generate_type` is equal to 0, the data loading
    
    # has some specificities, e.g. normalization, for NN-based loop filter set 0.
    
    dumast's avatar
    dumast committed
    # If `generate_type` is equal to 1, the data loading
    
    # has some specificities, e.g. normalization, for NN-based loop filter set 1.
    
    dumast's avatar
    dumast committed
    generate_type=1
    
    dumast's avatar
    dumast committed
    dl=data_loader.DataLoader(input_json,
                              patch_size,
                              poc_list,
                              generate_type=generate_type)
    
    print("Nb samples available: {}".format(dl.nb_patches()))
    print("Available components: {}".format(dl.components))
    
    # list of components to extract
    comps=["org_Y", "pred_Y", "qp_base"]
    p = dl.getPatchData(42,comps,border_size)
    # p now contains the 42th patches in the database
    ```
    
    On the fly loading
    ----------------------------------------------
    Call getPatchData on random index to create a batch of data. 
    
    Pro: do not use more disk space because data is created on the fly from original dataset.
    
    Cons: slower to generate a batch. The loading function should be put in an async funciton to avoid I/O latency.
    
    
    Offline loading
    ----------------------------------------------
    One can use the create\_unified\_dataset.py program to create a large dataset of already prepared patches.
    
    Pro: faster: only open one file and perform random reading inside.
    
    Cons: consume more disk space (float patches, duplication of the data).
    
    
    Full example of data dumping and data loading
    ==================
    The script training/example/sample_test.sh shows a full example of data dumping, dataset creation and data loading. 
    The loaded data are put into one binary file with all the patches inside. This file can be used to fast I/O during training.
    Finally, a sample program allows to visualize the resulting dumped patches.
    
    
    
    source/Lib/CommonLib/NNInference is a common api for unifiying input preparation and inference of NN models.
    The common api supports preparing input from any dumped information such as reconstruction, prediction, partitioning, boundary strength, base QP, slice QP, and slice type.
    Currently, input preparation and inference of the two NN-based loop filter sets are based on the common API.
    
    LOP NN loop-filter
    ------------------
    There are several NN-based loop filter sets in the common software.
    By default the LOP NN based loop-filter is activated. The default path is set assuming the encoder and decoder are run from the root of the repository.
    When running the codec from other location, please adapt the NnlfModelName parameter to point to the correct location.
    
    HOP NN loop-filter
    ------------------
    In order to use the HOP loop-filter, just change the NnlfModelName parameter to point to the HOP model (see cfg\/nn\_based/HOP.cfg file for an example).
    
    
    Legacy NN-based loop filter
    ==================
    
    
    ----------------------------------------------
    
    To activate NN-based loop filter set 0, use --NnlfOption=10, or equivalently -c cfg/nn-based/NnlfOption\_1.cfg
    
    To specify a model path, use e.g. --ModelPath="models/", or equivalently -mp "models/". Note that model path should be specified at both encoder and decoder.
    
    
    NN-based loop filter set 0
    ----------------------------------------------
    To activate NN-based loop filter set 0, use --NnlfOption=10, or equivalently -c cfg/nn-based/NnlfOption\_1.cfg
    
    
    To specify a model path, use e.g. --ModelPath="models/", or equivalently -mp "models/". Note that model path should be specified at both encoder and decoder.
    
    
    
    ----------------------------------------------
    
    To activate NN-based loop filter set 1, use --NnlfOption=11, or equivalently -c cfg/nn-based/NnlfOption\_2.cfg
    
    
    To specify model paths, use e.g. following command lines. Note that model paths should be specified at both encoder and decoder.
    
    --NnlfSet1InterLumaModel="models/NnlfSet1_LumaCNNFilter_InterSlice_int16.sadl"
    --NnlfSet1InterChromaModel="models/NnlfSet1_ChromaCNNFilter_InterSlice_int16.sadl"
    --NnlfSet1IntraLumaModel="models/NnlfSet1_LumaCNNFilter_IntraSlice_int16.sadl"
    --NnlfSet1IntraChromaModel="models/NnlfSet1_ChromaCNNFilter_IntraSlice_int16.sadl"
    
    LC NN-based loop filter set
    ----------------------------------------------
    
    To activate LC(low complexity) NN-based loop filter set, use --NnlfOption=12, or equivalently -c cfg/nn-based/NnlfOption\_3.cfg  
    
    
    The LC NNLF consists of the CP Decomposed & Fused (CPDF) LC models proposed in JVET-AD0156.  
    
    The LC NNLF models are realized in SADL float and int16 frameworks. This can be set with the help of the macro ``NN_FIXED_POINT_IMPLEMENTATION`` in the file ``source/Lib/CommonLib/TypeDef.h``. The macro needs to be set to either 0 or 1 for float or int16 respectively.
    
    To specify LC model paths, refer to following examples in command line. Note that model paths should be specified at both encoder and decoder. The path needs to be set based on the framework that the software is built for.
    
    SADL float: ``--LCModelPath=models/NnlfSetLC/LC_float_model0.sadl,models/NnlfSetLC/LC_float_model1.sadl,models/NnlfSetLC/LC_float_model2.sadl,models/NnlfSetLC/LC_float_model3.sadl``
    
    SADL int16: ``--LCModelPath=models/NnlfSetLC/LC_int16_model0.sadl,models/NnlfSetLC/LC_int16_model1.sadl,models/NnlfSetLC/LC_int16_model2.sadl,models/NnlfSetLC/LC_int16_model3.sadl``
    
    The training and model conversion to SADL details can be found at ``training/training_scripts/Nn_Filtering_Set_LC/README.pdf``
    
    
    NN-based loop filter encoder optimization
    ----------------------------------------------
    To activate NN-based loop filter encoder optimization, use --EncNnlfOpt=1. Note that the encoder optimization is disabled by default.
    
    To specify model paths, use e.g. following command lines. Note that model paths should be specified at encoder.
    --RdoCnnlfInterLumaModel="models/RdNnlfSet1_LumaCNNFilter_InterSlice_int16.sadl"
    --RdoCnnlfIntraLumaModel="models/RdNnlfSet1_LumaCNNFilter_IntraSlice_int16.sadl"
    
    
    
    Low-complexity version of the neural network-based intra prediction mode
    ------------------------------------------------------------------------
    
    When building VTM-11-NNVC, if the macro `JVET_AB0149_INTRA_PRED` is equal to 1 (default), the low-complexity version of the NN-based
    
    intra prediction mode can be controlled using `--NnIntraPred`. By default, the encoder and decoder parameter `PrefixAbsolutePathsToGraphsOutput`
    is equal to "models/intra" as it is assumed that the VTM-11-NNVC encoder/decoder executable is run from the root directory of the
    project. If the VTM-11-NNVC encoder/decoder executable is run from a directory different from the root directory of the project,
    please add the following argument when running the VTM-11-NNVC encoder/decoder executable: `--PrefixAbsolutePathsToGraphsOutput=path_to_directory_models_intra`,
    
    where `path_to_directory_models_intra` is the path to the directory "models/intra" relatively to the directory from which the
    VTM-11-NNVC encoder/decoder executable is run.
    
    To activate NN-based super resolution, use --NnsrOption=1. The default model path is set as "./models/super_resolution/".
    
    
    For rate matchinng, use following config file when testing class A1 or A2.
    [cfg/nn-based/nnsr_classA1.cfg](cfg/nn-based/nnsr_classA1.cfg)
    [cfg/nn-based/nnsr_classA2.cfg](cfg/nn-based/nnsr_classA2.cfg)
    
    
    To activate the content-adaptive post-filter use the corresponding sequence-QP configuration file under the following directory:
    [cfg/nn-based/nnpf/](cfg/nn-based/nnpf/)
    
    Each config files describes the post-filter as well as:
    
    * NNR bitstream, weight-update, with the parameter: `SEINNPFCPayloadFilename1`.
    * The over-fitted SADL model with the parameter: `NnpfModelPath`. This parameter must be used also when running the decoder.
    
    
    
    The NNPFC SEI messages are to be signalled **once** for the whole video sequence (poc = 0).
    The bitrate of the SEI messages is computed in the same fashion as the video bitstream. Therefore,
    if split encoding is considered, the overhead of the SEI messages needs to be readjusted to consider the
    length of the whole video sequence (value reported in the encoder log).
    
    Finally, the post-filtered reconstruction can be saved at the decoder side by adding the parameters
    `--ReconFile=reco.yuv --NnpfReconFile=recoPost.yuv`. **Note** Both parameters must be provided.