Content
How to Align Cloud Server’s timeline to Client’s?
Clock Drift (Skew) Estimation and Compensation
What’s Cloud Gaming?
The game is not executed locally on your computer, but on a distant server with dedicated hardware in a large data center. In other words Cloud Gaming is the ability to run games in the cloud as if they were locally installed.
Motivation: Cloud Gaming emancipates players from the need to constantly upgrade their computers to play more advanced games. Cloud gaming aims to revolutionize gaming; the players are no longer restricted to one device and can play games anywhere, anytime and using any device.
this figure figure taken from the paper “Quality of Experience (QoE) in Cloud Gaming Architectures: A Review”, Asif Ali Lahgari et al.
Prelude to Clock Alignment Method
Clock or timeline alignment (i deliberately not use “synchronization”) between Sender (Game Server) and Receiver (Client) is necessary, because Client presents video/audio frames according to PTS (presentation timestamps).
Indeed, Sender and Receiver commonly start theirs clocks at different moments, moreover clock rates in both Sender and Receiver are slightly different (it’s called the clock skewness). PTS (Picture Timestamp) of a frame is determined at Sender’s side by sampling Sender’s clock, usually after rendering of the frame.
If one determines PTS values at Muxer stage (i.e. after encoding) then audio tends to be ahead of video and the lip-sync might be noticeable (because the audio encoding takes 1-2ms while video encoding takes 5ms on NVIDIA Tesla T4, and with SW encoder the encoding times are greater than 5ms). My recommendation to sample clock and signal PTS (for both audio and video) after rendering.
To align Client clock to Server one the following operations are required to run ceaselessly in Client:
1) Estimation of relative offset between the two clocks (because both Game Server and Client start working at different times and have different clock values respectively).
2) Clock drift (skew) compensation (because clock frequencies at Game Server and at Client might differ slightly, but accumulated to significant discrepancy in several hours).
it’s recommended for Client to present video/audio basing on PTS and not on frame availability. Presentation frames on availability at Client means to present frames immediately after decoding by totally ignoring PTS. This approach (presentation frames on availability) i consider as erroneous in Cloud Gaming. Why? ,
The frame rate is found to be variable (e.g. sometimes the frame rate is less 60 fps, since Game Engine is overloaded or a scene is too complex). Moreover, in the worst case scenario, ignoring of PTS values in display schedule can cause lip-sync issues. In such case Client must discard an audio frame (consequently ‘click’ audio noise may be noticeable and misinterpreted by gamers).
How to Align Cloud Server’s timeline to Client’s?
For Cloud Gaming I suggest to stream media via RTP in order to exploit RTP timestamps for the alignment of clocks (by ceaselessly changing the offset between Cloud Server and Client clocks (by the way, due to network jitter we can estimate the clock offset only).
Due to network jitter the difference between RTP timestamp (set by Cloud Server) and the local clock of Client sampled when the RTP packet has completely arrived does not equal to the previous clock offset between Sender and Client. Therefore, we need collect some statistics to minimize an impact of network jitter.
The following algorithm outlines how to estimate the offset between Cloud Server and Client clocks by inspection of RTP timestamps:
- Ts(n) denotes Cloud Server’s timestamps of n-th RTP packet (i.e. clock sample before sending this packet).
- Tr(n) denotes Client’s clock at the moment n-th RTP packet completely arrived.
Client collects absolute differences between RTP packet’s arrival times (according to its own clock) for the bunch of N packets (N is usually 5 or 10)
{ d0 = |Tr(n) − Ts(n)|, d1 = |Tr(n + 1) − Ts(n + 1)| …, dN−1 = |Tr(n + N − 1) − Ts(n + N − 1)| } (1)
At the end of collection time differences dk , each of dk is an estimate of the clock’s offset between Cloud Server and Client, but noised by the network jitter.
To obtain a good estimation of the clock offset and to minimize an impact of the network jitter, Client search for k* such that
Receiver finds the index k among the differences such that:
k* = argmin(d0, d1, … , dN−1) (2)
The clock offset is estimated as offset = Tr(k*) − Ts(k*). Notice that PTS values should be also adjusted by the offset to be coherent with Client’s clock.
The above method (the formulas (1) and (2)) estimates the clock offset + nominal transit network delay and is correct for short-time periods, since the clock drift is not counted. A minor difference in the frequencies in Server and Client clocks can cause a significant drift for long-time periods. If your Cloud Gaming service restricts game session to one hour, you can stop reading this post, the clock drift is negligible.
Note: due to network jitter, the adjusted time of the second RTP is 7.2s despite its arrival time is 7.4s.
Each frame carries PTS (a time to be presented). Client is expected to present frames according to theirs adjusted PTS to avoid stuttering. Therefore, the relative clock offset should be added to each received PTS for the adjustment.
Each frame should be presented according to updated PTS. The clock offset should be periodically updated to count potential long-term variations in network delays.
Use Case:
Upon arrival to Client, all rtp-packets (excepting control packets) are queued. The packets are released to DeMuxer (the aim of DeMuxer is to collect a complete frame, which is sent to a decoder) according to aligned timestamps + a particular queuing offset. The longer the offset the longer time a rtp packets can dwell in the input queue. As mentioned above timestamps in RTP packets are tailored also to estimate the network jitter. However if RTP packets contain video-audio interleaved mpeg2ts packets then PCRs can be taken to assess the network jitter.
If RTP packet data length is 1316 bytes then 7 mpeg2ts packets (each 188 bytes) can be included.
Clock Drift (Skew) Estimation and Compensation
Let’s assume that Client presents video/audio frames according to adjusted PTS then due to the clock drift either input buffer gets too large (this means Server’s clock is slightly fast) or after a while frames are discarded by Client since they mistakenly are considered as overdue.
Let the difference between Server’s clock at arrival of the first RTP packet and its sending time is d0 = Tr(0) − Ts(0)
Then the following weighted moving average is expected to catch a long-term clock skew and to ignore short-term network jitter fluctuations:
D0 = d0
Dn = 31/32 * Dn−1 + (1 − 31/32) * dn , in the simplified form: Dn = (31*Dn−1 + dn)/32
The clock skew (between Client’s clock and Server’s) :
skew = d0 − Dn , for sufficiently large ‘n’
The clock skew should be estimated over the window of ‘n’ RTP packets. If the clock drift exceeds a predetermined threshold then Client should update the clock offset to compensate the skewness.
Notice that the threshold should be larger the maximal network jitter, otherwise Client can mistakenly consider the network jitter as the detected clock skew. On the other hand, the threshold should be small enough to avoid drops of frames (i.e. frames are mistakenly considered ad too late arrived according to Client’s clock or playout buffer gets too large).
Note: It’s tacitly assumed that the network jitter distribution is symmetric and any systematic bias is due to clock skew.
The instantaneous clock offset update (for skew clock compensation) can cause frame drop, if Client’s clock is instantaneously updated then some received frames can become “too late” and discarded respectively. Another effect of instantaneous clock offset update is stuttering, some frames are delayed until Client’s clock is equal to PTS.
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume