conv2d_ixj_s_peel calculates 'too much' of the result and conv2d_ixj_s_core_dispatch 'too little'
(See text file in next comment for better text formatting.)
When using a 1x3 convolution (half_size_h=0, half_size_w=1) and 'same' padding (pad=(0,1)), it seems conv2d_ixj_s_peel calculates not only the right and left borders, which is expected, but also the top border (but not the bottom). After running conv2d_ixj_s_peel with kernel [0 1 2], bias=[0] on an input tensor
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
we get the result
2 5 8 11 14 17 20 7
26 0 0 0 0 0 0 15
50 0 0 0 0 0 0 23
74 0 0 0 0 0 0 31
98 0 0 0 0 0 0 39
122 0 0 0 0 0 0 47
146 0 0 0 0 0 0 55
170 0 0 0 0 0 0 63
whereas the expected result would be
2 0 0 0 0 0 0 7
26 0 0 0 0 0 0 15
50 0 0 0 0 0 0 23
74 0 0 0 0 0 0 31
98 0 0 0 0 0 0 39
122 0 0 0 0 0 0 47
146 0 0 0 0 0 0 55
170 0 0 0 0 0 0 63
After also running conv2d_ixj_s_core_dispatch, we get the correct matrix result.
2 5 8 11 14 17 20 7
26 29 32 35 38 41 44 15
50 53 56 59 62 65 68 23
74 77 80 83 86 89 92 31
98 101 104 107 110 113 116 39
122 125 128 131 134 137 140 47
146 149 152 155 158 161 164 55
170 173 176 179 182 185 188 63
It does not look like the top row is calculated twice, because just running conv2d_ixj_s_core_dispatch and not conv2d_ixj_s_peel gives
0 0 0 0 0 0 0 0
0 29 32 35 38 41 44 0
0 53 56 59 62 65 68 0
0 77 80 83 86 89 92 0
0 101 104 107 110 113 116 0
0 125 128 131 134 137 140 0
0 149 152 155 158 161 164 0
0 173 176 179 182 185 188 0
which means that conv2d_ixj_s_core_dispatch calculates 'to little' (namely does not do the middle part of the top row).