From 4a442788385939c36bcb21ef0099f8a25a3858b7 Mon Sep 17 00:00:00 2001
From: Franck Galpin <franck.galpin@interdigital.com>
Date: Thu, 15 Jun 2023 09:38:06 +0200
Subject: [PATCH] update readme for testing configuration

---
 .../NN_Filtering_HOP/readme.md                | 290 ++++--------------
 1 file changed, 66 insertions(+), 224 deletions(-)

diff --git a/training/training_scripts/NN_Filtering_HOP/readme.md b/training/training_scripts/NN_Filtering_HOP/readme.md
index a7197a7515..12ec19d271 100644
--- a/training/training_scripts/NN_Filtering_HOP/readme.md
+++ b/training/training_scripts/NN_Filtering_HOP/readme.md
@@ -10,6 +10,8 @@ Other keys should not be edited except for testing reasons.
 
 
 ## I- Model Stage I
+Total size required for stage1 (without deleting intermediate data is about 3312GB).
+
 ### A- Data extraction for intra from vanilla VTM
 #### 1. Dataset preparation - div2k conversion
 
@@ -87,12 +89,14 @@ python3 tools/create_unified_dataset.py --json_config training_scripts/NN_Filter
   --output_location stage1/dataset_valid
 ```
 It will generate a unique dataset of patches ready for training in ["stage1"]["dataset"]["path"] from the dataset in ["stage1"]["encdec"]["path"].
+
 **Note:** the directories in encdec can now be deleted if there is no need to regenerate an offline dataset.
 
 The dataset can be visualize using
 ```sh
 python3 tools/example/display_patches_dataset.py stage1/dataset/dataset.json
 ```
+
 ** Note ** The size requirement is: about 2300GB for the training set and about 91GB for the validation set.
 
 
@@ -145,11 +149,30 @@ and the correct data macros should be set:
 #define NNVC_USE_QP                     1 // QP slice
 #define JVET_AC0089_NNVC_USE_BPM_INFO   1 // JVET-AC0089: dump Block Prediction Mode
 ```
-The model is tested using with parameters
+
+##### 3.1 Test on intra only
+The model is tested with NNVC-5.1 using the following parameters:
+```
+--NnlfOption=4 --NnlfHopModelName=stage1/train/model_float.sadl --NnlfHopDebugOption=1
+```
+The configuration file is ``encoder_xxx_vtm.cfg`` and the anchor VTM-11.0\_NNVC.
+
+##### 3.2 Test with forced usage
+The model is tested with NNVC-5.1 using the following parameters:
 ```
---NnlfOption=4 --NnlfHopModelName=stage1/train/model_float.sadl
+--NnlfOption=4 --NnlfHopModelName=stage1/train/model_float.sadl --NnlfHopDebugOption=2
 ```
-using NNVC 5.1
+The configuration file is ``encoder_xxx_vtm.cfg`` and the anchor VTM-11.0\_NNVC.
+The flag ``NnlfHopDebugOption`` is also needed at decoder since it forces the usage of the model on inter slice using the intra "part" of the model.
+
+
+
+
+
+
+
+
+
 
 ## II- Model Stage 2
 ### A- Data extraction
@@ -271,6 +294,36 @@ python3 training_scripts/NN_Filtering_HOP/convert/to_sadl.py training_scripts/NN
 
 **Note:** the directory in dataset can now be deleted if there is no need to retrain.
 
+#### 3. Test model
+To test the float model, the type should be set as float in ``TypeDef.h``:
+```
+#if NN_HOP_UNIFIED
+using TypeSadlHOP=float;
+#endif
+```
+and the correct data macros should be set:
+```
+// which data are used for inference/dump
+#define NNVC_USE_REC_BEFORE_DBF         1 // reconstruction before DBF
+#define NNVC_USE_PRED                   1 // prediction
+#define NNVC_USE_BS                     1 // BS of DBF
+#define NNVC_USE_QP                     1 // QP slice
+#define JVET_AC0089_NNVC_USE_BPM_INFO   1 // JVET-AC0089: dump Block Prediction Mode
+```
+
+The model is tested with NNVC-5.1 using the following parameters:
+```
+--NnlfOption=4 --NnlfHopModelName=stage2/train/model_float.sadl
+```
+The configuration file is ``encoder_xxx_vtm.cfg`` and the anchor VTM-11.0\_NNVC.
+
+
+
+
+
+
+
+
 ## III- Model Stage 3
 ### A- Data extraction
 #### 1. Dataset preparation
@@ -375,146 +428,14 @@ python3 training_scripts/NN_Filtering_HOP/convert/quantize.py  --json_config tra
 
 **Note:** the directory in dataset can now be deleted if there is no need to retrain.
 
-
-## IV-Advanced
-
-### A-Dataset
-#### 1- Format
-dataset:
- either start/count/path
-OR
- path/dataset_file
-
-preprend path variable to dataset_file if  dataset_file is just a name
-
-dataset:
-if contains path_yuv: prepend filename, else assume same location json
-
-### A-Parameters
-
-
-
-
-## II- Model Stage 2
-### A- Data extraction
-#### 1. tvd last frame conversion
-
-Convert BVI-DVC 64 frames sequence into 65 frames sequence, and create the TVD dataset file (YUV are still taken from source).
-
-```sh
-   python3 tools/convert_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset dataset/bvi  --output_location stage2/yuv_bvi --extend 65
-   python3 tools/convert_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset dataset/bvi_valid --output_location stage2/yuv_bvi_valid --extend 65
-   python3 tools/convert_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset dataset/tvd   --output_location stage2/yuv_tvd
-   python3 tools/convert_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset dataset/tvd_valid   --output_location stage2/yuv_tvd_valid
-```
-dataset files are placed in the target directory (as set in the config.json ["stage2"]["yuv"]["path"]), a json file named ["stage2"]["yuv"]["dataset_filename"] is updated with the new data.
-
-#### 2. prepare script for encoding/decoding of the dataset
-Please note that a VTM with the model of stqge 1 is used. NNVC-5.0 can be used to generate the binaries and cfg file. The configuration file is the vanilla VTM one (see config.json), the input model is added automatically.
-The macro for data dump should be:
-```
-// which data are used for inference/dump
-#define NNVC_USE_REC_BEFORE_DBF         1 // reconstruction before DBF
-#define NNVC_USE_PRED                   1 // prediction
-#define NNVC_USE_BS                     1 // BS of DBF
-#define NNVC_USE_QP                     1 // QP slice
-#define JVET_AC0089_NNVC_USE_BPM_INFO   1 // JVET-AC0089: dump Block Prediction Mode
-```
-Other macros can be set to 0.
-
-Extract cfg files and encoding/decoding script:
-```sh
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_tvd --output_location stage2/encdec_tvd
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_tvd_valid --output_location stage2/encdec_tvd_valid
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_bvi --output_location stage2/encdec_bvi
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_bvi_valid --output_location stage2/encdec_bvi_valid
-```
-It will generate the cfg files for the dataset and a shell script to encode and decode all sequences in the dataset in the directory ["stage2"]["encdec"]["path"].
-
-#### 3. encode/decode the sequences:
-
-Loop on all sequences to encode, for example:
-```sh
-cd stage2/encdec;
-LIST="encode_decode_dataset_tvd encode_decode_dataset_bvi";
-for DS in $LIST; do
- N=$(cat ./${DS}.sh | grep "\$1 ==" | tail -1 | sed -E 's/.*== ([0-9]+) .*/\1/g');
- for((i=0;i<=N;i++)); do
-  ./${DS}.sh $i;
- done;
- N=$(cat ./${DS}_valid.sh | grep "\$1 ==" | tail -1 | sed -E 's/.*== ([0-9]+) .*/\1/g');
- for((i=0;i<=N;i++)); do
-  ./${DS}_valid.sh $i;
- done;
-done
-```
-or you can use the script to encode on your cluster. N is the number of sequences (run ./encode_decode_dataset.sh to get the value N).
-
-#### 4. create a consolidated dataset
-
-```sh
-LIST="bvi bvi_valid tvd tvd_valid";
-for DS in $LIST; do
- python3 tools/concatenate_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dir_json stage2/encdec_${DS}  --input_dir_dataset_json stage2/yuv_${DS}  --output_json stage2/encdec_${DS}
-done
-```
-It will generate a unique dataset for each dataset in ["stage2"]["encdec"]["path"] from all individual datasets in ["stage2"]["encdec_xxx"]["path"]/["dump_dir"] and encoder logs in ["stage2"]["encdec_xxx"]["enc_dir"].
-
-#### 5a. create an offline dataset with all batches
-
-```sh
-LIST="bvi tvd";
-for DS in $LIST; do
-  python3 tools/create_unified_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json \
-  --nb_patches -1 --patch_size 128 --border_size 8 --input_dataset stage2/encdec_${DS}   \
-  --components org_Y,org_U,org_V,pred_Y,pred_U,pred_V,rec_before_dbf_Y,rec_before_dbf_U,rec_before_dbf_V,bs_Y,bs_U,bs_V,qp_base,qp_slice,ipb_Y \
-  --output_location stage2/dataset/dataset_files/${DS}
-
-  python3 tools/create_unified_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json \
-  --nb_patches -1 --patch_size 128 --border_size 8 --input_dataset stage2/encdec_valid_${DS}   \
-  --components org_Y,org_U,org_V,pred_Y,pred_U,pred_V,rec_before_dbf_Y,rec_before_dbf_U,rec_before_dbf_V,bs_Y,bs_U,bs_V,qp_base,qp_slice,ipb_Y \
-  --output_location stage2/dataset_valid/dataset_files/${DS}
-done
-```
-
-The div2k dataset is alo reused. The json files are copied into the stage2/dataset directory:
-```
-# copy dataset json of div2k
-cp stage1/dataset/*.json  stage2/dataset/;
-```
-
-**Note:** the directories in encdec can now be deleted if there is no need to regenerate an offline dataset.
-
-#### 5b. create a dataset for on-the-fly batch extraction
-TODO
-
-### B- Training stage
-#### 1. Train HOP model
-If you need to adapt the settings of your device for training, please edit the file ``training_default.json`` (default parameters) or ``stage2.json`` (parameters specific to stage2). You can also change the loggers verbosity in these files.
-When ready, simply run:
-
-```sh
-python3 training_scripts/NN_Filtering_HOP/training/main.py  --json_config training_scripts/NN_Filtering_HOP/config.json \
-  training_scripts/NN_Filtering_HOP/model/model.json  \
-  training_scripts/NN_Filtering_HOP/training/cfg/training_default.json \
-  training_scripts/NN_Filtering_HOP/training/cfg/stage2.json
+#### 3. Test model
+To test the int16 model, the type should be set as in16 in ``TypeDef.h``:
 ```
-
-#### 2. Convert model
-The last ONNX model is converted into float SADL format.
-```sh
-python3 training_scripts/NN_Filtering_HOP/convert/to_sadl.py training_scripts/NN_Filtering_HOP/config.json \
-   --input_model stage2/training --output_model stage2/conversion
+#if NN_HOP_UNIFIED
+using TypeSadlHOP=int16_t;
+#endif
 ```
-
-**Note:** the directory in dataset can now be deleted if there is no need to retrain.
-
-## III- Model Stage 3
-### A- Data extraction
-#### 1. prepare script for encoding/decoding of the dataset
-This step is similar to II-A-2.
-Please note that a VTM with the model of stqge 1 is used. NNVC-5.0 can be used to generate the binaries and cfg file. The configuration file is the vanilla VTM one (see config.json), the input model is added automatically.
-The macro for data dump should be:
+and the correct data macros should be set:
 ```
 // which data are used for inference/dump
 #define NNVC_USE_REC_BEFORE_DBF         1 // reconstruction before DBF
@@ -523,92 +444,12 @@ The macro for data dump should be:
 #define NNVC_USE_QP                     1 // QP slice
 #define JVET_AC0089_NNVC_USE_BPM_INFO   1 // JVET-AC0089: dump Block Prediction Mode
 ```
-Other macros can be set to 0.
 
-Extract cfg files and encoding/decoding script:
-```sh
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage1/yuv --output_location stage3/encdec_dik2k_valid --model stage1/conversion
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage1/yuv_valid --output_location stage3/encdec_div2k_valid --model stage1/conversion
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_tvd --output_location stage3/encdec_tvd
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_tvd_valid --output_location stage3/encdec_tvd_valid
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_bvi --output_location stage3/encdec_bvi
-python3 tools/dataset_to_encoding_script.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dataset stage2/yuv_bvi_valid --output_location stage3/encdec_bvi_valid
+The model is tested with NNVC-5.1 using the following parameters:
 ```
-It will generate the cfg files for the dataset and a shell script to encode and decode all sequences in the dataset in the directory ["stage3"]["encdec"]["path"].
-
-#### 2. encode/decode the sequences:
-This step is similar to II-A-3.
-
-Loop on all sequences to encode:
-```sh
-cd stage3/encdec;
-LIST="encode_decode_dataset_div2k encode_decode_dataset_tvd encode_decode_dataset_bvi";
-for DS in $LIST; do
- N=$(cat ./${DS}.sh | grep "\$1 ==" | tail -1 | sed -E 's/.*== ([0-9]+) .*/\1/g');
- for((i=0;i<N;i++)); do
-  ./${DS}.sh $i;
- N=$(cat ./${DS}_valid.sh | grep "\$1 ==" | tail -1 | sed -E 's/.*== ([0-9]+) .*/\1/g');
- for((i=0;i<N;i++)); do
-  ./${DS}_valid.sh $i;
-done
-```
-or you can use the script to encode on your cluster. N is the number of sequences (run ./encode_decode_dataset.sh to get the value N).
-
-#### 3. create a consolidated dataset
-This step is similar to II-A-4.
-```sh
-LIST="div2k div2k_valid bvi bvi_valid tvd tvd_valid";
-for DS in $LIST; do
- python3 tools/concatenate_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json --input_dir_json stage3/encdec_${DS}  --output_json stage3/encdec_${DS}
-done
-```
-It will generate a unique dataset for each dataset in ["stage3"]["encdec"]["path"] from all individual datasets in ["stage3"]["encdec_xxx"]["path"]/["dump_dir"] and encoder logs in ["stage3"]["encdec_xxx"]["enc_dir"].
-
-#### 4a. create an offline dataset with all batches
-This step is similar to II-A-5a.
-```sh
-LIST="div2k div2k_valid bvi bvi_valid tvd tvd_valid";
-for DS in $LIST; do
-  python3 tools/create_unified_dataset.py --json_config training_scripts/NN_Filtering_HOP/config.json \
-  --nb_patches -1 --patch_size 128 --border_size 8 --input_dataset stage3/encdec_${DS}   \
-  --components org_Y,org_U,org_V,pred_Y,pred_U,pred_V,rec_before_dbf_Y,rec_before_dbf_U,rec_before_dbf_V,bs_Y,bs_U,bs_V,qp_base,qp_slice,ipb_Y \
-  --output_location stage3/dataset
-done
-```
-**Note:** the directories in encdec can now be deleted if there is no need to regenerate an offline dataset.
-
-#### 5b. create a dataset for on-the-fly batch extraction
-TODO
-
-
-### B- Training stage
-If you need to adapt the settings of your device for training, please edit the file ``training_default.json`` (default parameters) or ``stage3.json`` (parameters specific to stage2). You can also change the loggers verbosity in these files.
-When ready, simply run:
-
-```sh
-python3 training_scripts/NN_Filtering_HOP/training/main.py  --json_config training_scripts/NN_Filtering_HOP/config.json \
-  training_scripts/NN_Filtering_HOP/model/model.json  \
-  training_scripts/NN_Filtering_HOP/training/cfg/training_default.json \
-  training_scripts/NN_Filtering_HOP/training/cfg/stage3.json
+--NnlfOption=4 --NnlfHopModelName=stage2/train/model_int16.sadl
 ```
-
-#### 2. Convert model
-The last ONNX model is converted into float SADL format.
-```sh
-python3 training_scripts/NN_Filtering_HOP/convert/to_sadl.py training_scripts/NN_Filtering_HOP/config.json \
-   --input_model stage3/training --output_model stage3/conversion
-```
-
-#### 3. Integerize model
-The float model is integerized into int16 SADL format.
-```sh
-python3 training_scripts/NN_Filtering_HOP/convert/quantize.py  --json_config training_scripts/NN_Filtering_HOP/config.json \
-   --input_model stage3/conversion --output_model stage3/conversion_int --quantize  training_scripts/NN_Filtering_HOP/convert/quantize.json
-```
-
-
-**Note:** the directory in dataset can now be deleted if there is no need to retrain.
-
+The configuration file is ``encoder_xxx_vtm.cfg`` and the anchor VTM-11.0\_NNVC.
 
 ## IV-Advanced
 
@@ -628,3 +469,4 @@ if contains path_yuv: prepend filename, else assume same location json
 
 
 
+
-- 
GitLab