conv2dtranspose extension
Dear Franck,
In this MR, I extend the Conv2dtranspose feature. Specifically, there are three main extensions, including
- add the support for 4x4 kernel
- speed up the function conv2dtranspose (without SIMD) by two strategies, continuous data access(increasing L2 cache hits) and indirect addressing
- base on the function conv2dtranspose, I add the function conv2dtranspose_simd256 (with AVX2) and modify the conv2dtranspose_simd512 (it also gets acceleration)
The speed testing files are listed here: conv2dt_64_256x256x64_k3_3_s2_2_p1_1_op1_1.onnx conv2dt_256_128x128x256_k3_3_s2_2_p1_1_op1_1.onnx
Please check!