Let’s assume the first audio and video frames are rendered (or created) simultaneously

Then an audio encoder would take less time than an video encoder (video encoding is much more complex than audio one). Consequently, audio encoded frame is expected to be ready for transmission earlier than corresponding video frame.

Moreover, audio frame size is about 1KB and hence such small frame is quickly transmitted while an ordinary video frame size is 20-40KB and transmission of such data chunk would take a while (say, 16-30ms).

In addition, decoding of audio tends to take less time than decoding of video. 

 

Thus, at the start audio is output earlier than video, despite both audio and video frames are rendered (created) simultaneously.  A lag between audio and video might occur (audio is ahead of video). Although the lag is not expected to be noticeable as lip-sync (20-30ms), it’s recommended to delay the first audio frame to make both audio and video appear simultaneously at Client’s side, because in further lag between audio and video might grow and a lip-sync might be noticeable. 

9 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *