JVET-O0304: Reduction of number of multiplications in BDOF
1 unresolved thread
1 unresolved thread
JVET-O0304: Reduction of number of multiplications in BDOF
Edited by Karsten Suehring
Merge request reports
Activity
added 17 commits
-
9821a5ee...ba80d1b1 - 16 commits from branch
jvet:master
- 67c03eeb - Merge branch 'master' into 'O0304'
-
9821a5ee...ba80d1b1 - 16 commits from branch
added 5 commits
-
67c03eeb...829fd694 - 4 commits from branch
jvet:master
- 3970c5c8 - Merge branch 'master' into 'O0304'
-
67c03eeb...829fd694 - 4 commits from branch
added 20 commits
-
3970c5c8...29fb1a22 - 19 commits from branch
jvet:master
- 3488e48e - Merge branch 'master' into 'O0304'
-
3970c5c8...29fb1a22 - 19 commits from branch
- Resolved by Xiang Li
added 37 commits
-
3488e48e...bbd3400f - 36 commits from branch
jvet:master
- e6dd775b - Merge branch 'master' into 'O0304'
-
3488e48e...bbd3400f - 36 commits from branch
added 2 commits
- e7372920 - JVET_O0304_SIMPLIFIED_BDOF: Reduction of number of multiplications in BDOF
- a54fe3ff - Merge branch 'O0304' of https://vcgit.hhi.fraunhofer.de/ykato/VVCSoftware_VTM into O0304
- Resolved by Frank Bossen
298 __m128i shiftSrcY0Tmp = _mm_srai_epi16(_mm_loadu_si128((__m128i*)(srcY0Tmp)), shift4); 299 __m128i shiftSrcY1Tmp = _mm_srai_epi16(_mm_loadu_si128((__m128i*)(srcY1Tmp)), shift4); 300 __m128i loadGradX0 = _mm_loadu_si128((__m128i*)(gradX0)); 301 __m128i loadGradX1 = _mm_loadu_si128((__m128i*)(gradX1)); 302 __m128i loadGradY0 = _mm_loadu_si128((__m128i*)(gradY0)); 303 __m128i loadGradY1 = _mm_loadu_si128((__m128i*)(gradY1)); 304 __m128i subTemp1 = _mm_sub_epi16(shiftSrcY1Tmp, shiftSrcY0Tmp); 305 __m128i packTempX = _mm_srai_epi16(_mm_add_epi16(loadGradX0, loadGradX1), shift5); 306 __m128i packTempY = _mm_srai_epi16(_mm_add_epi16(loadGradY0, loadGradY1), shift5); 307 __m128i gX = _mm_abs_epi16(packTempX); 308 __m128i gY = _mm_abs_epi16(packTempY); 309 __m128i maskXlt = _mm_cmplt_epi16(packTempX, zero); 310 __m128i maskXgt = _mm_cmpgt_epi16(packTempX, zero); 311 __m128i maskYlt = _mm_cmplt_epi16(packTempY, zero); 312 __m128i maskYgt = _mm_cmpgt_epi16(packTempY, zero); 313 __m128i dIX = _mm_or_si128(_mm_and_si128(maskXgt, subTemp1), _mm_and_si128(maskXlt, _mm_sub_epi16(zero, subTemp1))); @ykato Did you see my comment from 2 days ago?
added 198 commits
-
12e8c6f5...caa2e65c - 197 commits from branch
jvet:master
- fc40c69b - Merge branch 'master' into 'O0304'
-
12e8c6f5...caa2e65c - 197 commits from branch
mentioned in commit c4ee7791
Sorry for my too late response. Your suggestion is appropriate. I have confirmed that the following implementation matches the previous one. It may be too late, but I will apply it as a new merge request.
__m128i dIX = _mm_sign_epi16(subTemp1, packTempX );
__m128i dIY = _mm_sign_epi16(subTemp1, packTempY );
__m128i signGY_GX = _mm_sign_epi16(packTempX, packTempY );
Edited by Kato Yusuke
Please register or sign in to reply