Skip to content
Snippets Groups Projects

JVET-O0304: Reduction of number of multiplications in BDOF

Merged Kato Yusuke requested to merge ykato/VVCSoftware_VTM:O0304 into master
1 unresolved thread

JVET-O0304: Reduction of number of multiplications in BDOF

Edited by Karsten Suehring

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Kato Yusuke added 37 commits

    added 37 commits

    Compare with previous version

  • Kato Yusuke added 2 commits

    added 2 commits

    Compare with previous version

  • Xiang Li
  • Kato Yusuke added 1 commit

    added 1 commit

    • 12e8c6f5 - Update BufferX86.h, Modify variable names.

    Compare with previous version

  • Author Contributor

    Thanks for pointing out. I have modified it accordingly.

  • Frank Bossen resolved all discussions

    resolved all discussions

  • 298 __m128i shiftSrcY0Tmp = _mm_srai_epi16(_mm_loadu_si128((__m128i*)(srcY0Tmp)), shift4);
    299 __m128i shiftSrcY1Tmp = _mm_srai_epi16(_mm_loadu_si128((__m128i*)(srcY1Tmp)), shift4);
    300 __m128i loadGradX0 = _mm_loadu_si128((__m128i*)(gradX0));
    301 __m128i loadGradX1 = _mm_loadu_si128((__m128i*)(gradX1));
    302 __m128i loadGradY0 = _mm_loadu_si128((__m128i*)(gradY0));
    303 __m128i loadGradY1 = _mm_loadu_si128((__m128i*)(gradY1));
    304 __m128i subTemp1 = _mm_sub_epi16(shiftSrcY1Tmp, shiftSrcY0Tmp);
    305 __m128i packTempX = _mm_srai_epi16(_mm_add_epi16(loadGradX0, loadGradX1), shift5);
    306 __m128i packTempY = _mm_srai_epi16(_mm_add_epi16(loadGradY0, loadGradY1), shift5);
    307 __m128i gX = _mm_abs_epi16(packTempX);
    308 __m128i gY = _mm_abs_epi16(packTempY);
    309 __m128i maskXlt = _mm_cmplt_epi16(packTempX, zero);
    310 __m128i maskXgt = _mm_cmpgt_epi16(packTempX, zero);
    311 __m128i maskYlt = _mm_cmplt_epi16(packTempY, zero);
    312 __m128i maskYgt = _mm_cmpgt_epi16(packTempY, zero);
    313 __m128i dIX = _mm_or_si128(_mm_and_si128(maskXgt, subTemp1), _mm_and_si128(maskXlt, _mm_sub_epi16(zero, subTemp1)));
  • @ykato Did you see my comment from 2 days ago?

  • Frank Bossen added 198 commits

    added 198 commits

    Compare with previous version

  • Frank Bossen added 1 commit

    added 1 commit

    • aa18b741 - Update source/Lib/CommonLib/TypeDef.h

    Compare with previous version

  • Frank Bossen added 1 commit

    added 1 commit

    • 3530f663 - Update source/Lib/CommonLib/x86/BufferX86.h

    Compare with previous version

  • merged

  • Frank Bossen mentioned in commit c4ee7791

    mentioned in commit c4ee7791

  • Author Contributor

    Sorry for my too late response. Your suggestion is appropriate. I have confirmed that the following implementation matches the previous one. It may be too late, but I will apply it as a new merge request.
    __m128i dIX = _mm_sign_epi16(subTemp1, packTempX );
    __m128i dIY = _mm_sign_epi16(subTemp1, packTempY );
    __m128i signGY_GX = _mm_sign_epi16(packTempX, packTempY );

    Edited by Kato Yusuke
  • Please register or sign in to reply
    Loading