In the figure above thread 8 starts after the first thread has completed the half of CTUs.
However, slicing reduces start lags and parallelization can be better exploited:

23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume