Fix the CTU width in xGetSSE_NxN_SIMD function for size 128/256.
Bug happens when disable LMCS in current ECM for Class A. When CTU size equals to 256 and SSE distortion is applied, the wrong CTU_width=128 is used for CTU 256. Fix by adding a new xGetSSE_NxN_SIMD function for size 128/256, which fetches the right CTU width.