One of the problems of WPP parallelization is long lags for CTU rows at the end of picture, because (k+1)-th thread starts after the previous one has completed two CTUs:
In the figure above thread 8 starts after the first thread has completed the half of CTUs.
However, slicing reduces start lags and parallelization can be better exploited:
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume