Video Compression

VideoNerd

Content

What’s Cloud Gaming?

History of Cloud Gaming

Cloud Gaming in Details

Cloud Gaming Problems

Appendix A: Gaming content has unique characteristics different from live video

Appendix B:  Temporal Masking

Appendix C: the paper “A Network Analysis on Cloud Gaming: Stadia, GeForce Now and PSNow”

Appendix D: the requirements for tolerable cloud gaming

 

 

What’s Cloud Gaming?

Cloud Gaming is the ability to run games in the cloud as if they were locally installed.

The game is not executed locally on your computer, but on a distant server with dedicated hardware in a large data center. In other words Cloud Gaming is the ability to run games in the cloud as if they were locally installed.

Motivation:  Cloud Gaming emancipates players from the need to constantly upgrade their computers to play more advanced games. Cloud gaming aims to revolutionize gaming; the players are no longer restricted to one device and can play games anywhere, anytime and using any device.

History of Cloud Gaming

First serious work in Cloud Gaming i found so far is the paper “A Server-based Interactive Remote Walkthrough”, by Daniel Cohen-Or, Yuval Noimark and Tali Zvi, approximately dated in 2000.

 

Due to rendering of images by a computer-graphics engine it’s easily to divide each image to foreground and background and to make quantization of the foreground more finer (i.e. with better visual quality) on account of making background quality worse.  E.g. Human Vision is highly sensitive to a noise on faces, therefore it’s worth to encode faces with higher quality in order to provide to user better visual experience.

Nowadays, most of encoders support ROI (region of interest) mode. E.g. NVIDIA encoder Tesla T4 supports ROI and enables to calibrate the quantization on these regions. Because NVIDIA ROI mode is rectangle-based (i.e. each ROI is a rectangle) and a foreground can be a set of very complex geometric figures such that the framing rectangles might comprise a significant part of background.  Therefore rectangle-based ROI mode is not a good solution.

MB-map based ROI is more suitable, even if foreground boundary is rarely coincides with MB boundaries.

 

 

Cloud Gaming in Details

Gamer actions (e.g. pressing on a key or moving of a mouse) are captured by Game Client and sent to Cloud Server. The resulting game scenes are rendered by Cloud Server and the audio and video frames are streamed back to Game Client over Internet, usually over UDP/IP framework. The UDP framework is used instead of TCP to minimize delay and maximize responsiveness of system (of course, at the expense of error resilience). TCP is not the best bet for Cloud Gaming, due to requirements to re-transmit any dropped packet or any packet that the receiver has not not acknowledged or the acknowledge itself gotten lost (i.e. if a packet arrives correctly, but its ACK from Receiver gets lost or arrives Sender after Retransmission Time-Out interval, then this packet is re-transmitted, actually duplicated).

 

Processing in Cloud Server: the game engine runs the game logic, renders the game scenes (images), and sends frame-based scenes (usually in RGBA format) to the video encoder. The video encoder in turn performs compression to produce video which is then streamed back to Game Client. The video encoder usually is agnostic of the structure of scenes.

 

 

 

another figure taken from the paper “Quality of Experience (QoE) in Cloud Gaming Architectures: A Review”, Asif Ali Lahgari et al.

from the above figure one can see that a game is rendered at a cloud server, encoded images and sound are sent to end users, commands are sent fro end users to the cloud server.

Note: there is a non-popular approach to Cloud Gaming, let’s call it – streaming graphics. No scene is rendered at the cloud server, only Game Logic is performed, Game Logic module sends graphics commands to Client and Client in turn renders the scene.  This approach is not popular since Client is required to be equipped with powerful GPUs.

 

 

 

 

 

Using cloud gaming systems, users can instantly run game applications, without losing time with lengthy installations or updates. Another important advantage of cloud gaming is cross-platformness, the Client OS can be MAC or Windows or Linux or Android.

 

The ‘heavy lifting’ of game processing is done by servers in the cloud. Different game sessions are run on the cloud gaming server by sharing same GPU and HW-Encoder.

Network connection is influenced by a number factors: propagation delay, queuing delay in intermediate nodes during packet transmission,  network jitter, packet loss, packet re-ordering and even packet duplication. Transmission over the public Internet mainly introduces delays, jitter, and packet loss.  The use of large buffers at Sender and Receiver side helps to cope with jitter and packet loss (by means of  re-transmission), but the latency is increased and such approach is not suitable for low-latency applications like Cloud Gaming. 

Three parameters are important for Cloud Gaming ecosystem (Server, network and Client):  bandwidth, latency and packet loss rate. In addition to these parameters the parameter jitter (formally the first derivative of latency) is also relevant, since significant jitter (i.e. significant fluctuations of network and/or processing delays) can cause visible stuttering (buffer depletion at Client’s side). The jitter can be solved by buffering at Client but the latency is increased and for cloud-gaming applications such large buffers are not suitable.

Gamers thus interact and control games through thin client SW. The thin client (actually Game Client) is a lightweight process which interacts with the remote cloud server (the paradigm of Thin Client requires that resources, all the processing being done on the server in the cloud).

For this reason, cloud gaming allows gamers to play games with simple devices without having to install the games or to continuously upgrade computer hardware or software (e.g. DirectX versions). However, Cloud Gaming may cause some perceived latency (also known as a lag in gaming), such latency is especially annoying for the First Person Shooter game genre.

To make cloud gaming seamless, low latency ( = low response delay), high video quality, video and audio synchronization, and error resiliency (as a result of packet losses) have to be achieved.

There are two traffics:  client to server (c2s) and server to client (s2c). The first traffic c2s is thin, it consists of control messages – keyboard, mouse and gamepad events. A special hook procedure intercepts these events, encapsulates them in packets and sends them to the game server.

client to server traffic:

Keyboard event consists of two bytes – the key number and the key status (if the key was pressed or released). 

Mouse event contains 9 bytes:

    • The type of mouse event – mouse movement or mouse button click.
    • The rest of 8 bytes represent the x and y coordinates used to determine the mouse position where the event was fired.

Note: Client events (like mouse) can be buffered in Server, in so called “action buffer”:

 

The figure above is taken from the paper “Timely Cloud Gaming“, Roy D. Yates et al.

According to the paper “On the Quality of Service of Cloud Gaming Systems“, Kuan-Ta Chen et al., 2014, the results from OnLive cloud gaming service revealed that client-to-server (or events) traffic ranges between 10kbps through 70kbps, negligible related to server-to-client traffic.

 

server to client traffic:

The second traffic s2c is heavy, it contains encoded video and audio streams (usually transmitted via UDP/IP due to ultra-low latency).

 

End2End delay (or Response Delay) is combined from

  1. Server Processing delay: it represents the time between when the server receives the control event and sends the encoded frame to the client.   Server Processing delay includes rendering time, encoding time, delay in Muxer, delay in Streamer.
  2. Client Playout Delay: it is the time to demux (incl. buffering), to decode and to render the decoded frames on the screen on the client side.
  3. Network Delay or Round Trip Delay: it is the time required for a round trip data exchange between the client and the server. In other words, the time required to deliver a player’s command to the server and return an encoded game screen to the client.  Sužnjević and Homen in the paper “Use of Cloud Gaming in Education” (2020), stated that cloud gaming systems would require a Round Trip Time (RTT) network latency of less than 70ms.

Packet losses in s2c traffic are unavoidable. Therefore an enhanced error resilience mechanism is required in Game Client. In case of packet loss Game Client notifies Game Server (via TCP/IP message) and Game Server sends IDR-frame or Intra-refresh to clean up visual distortions incurred by the packet loss.

Note:

  • Some info on the scenes could be useful, e.g. objects motions, if the game engine provided object motions then the video encoder would have skipped the motion estimation operation for certain blocks in the frame, leading to faster encoding. According to the paper “A VIDEO ENCODING SPEED-UP ARCHITECTURE FOR CLOUD GAMING“, by Mehdi Semsarzadeh et al., getting motion information of objects speeds up encoding times (the motion estimation process was accelerated by up to 19.85%) with a penalty in visual quality of 1dB.

 

  • There is one interesting ramification of Cloud Gaming paradigm, mentioned in the paper “NETWORK TRAFFIC ADAPTATION FOR CLOUD GAMES”, Richard Ewelle et al.   The server executes the game, captures the graphic commands and sends them to the client (not encoded frames, but graphics commands). The commands are rendered on the client device allowing the full game experience. In other words, game’s logic is executed on the server, graphics stream is sent to the client (instead of video stream) and it’s responsibility of the client to render the scene.

 

Figure: Cloud Gaming Server

 

Cloud Gaming Problems

Cloud gaming providers should provide both high bandwidth (= high video quality) and low latency ( = high responsiveness). If the latency is high, the average user may feel the game unresponsive.  Generally speaking, delay and packet loss ratio impacts on the QoE (Quality of Experience) of cloud gaming. The QoE of cloud gaming can be evaluated in two ways: subjectively (MOS – Mean Opinion Score) and objectively.

The subjective quality assessment includes the employment of human subjects that rank the quality of their perception of the game. This is a time-intensive and resource-intensive task. It’s worth mentioning that subjective quality score MOS (Mean Opinion Score) can be deceiving since it does not reflect the diversity (or STD) of user ratings.

Alternatively, the objective evaluation uses application-based Key Performance Indicators (KPIs) to estimate the quality of perception.

 

Cloud Gaming is a case of low-latency media streaming. Low latency video/audio streaming can conceptually be thought to consist of the following steps:

  •  Server: Partition the compressed video into packets
  •  Server: Start delivery of these packets as soon as possible
  • Client:  Begin decoding (and playback if time arrives) at the Client while the video is still being delivered

Four basic problems in Cloud Gaming streaming to cope with:

  • Bandwidth – internet bandwidth is time-varying, the smaller bandwidth the worse visual quality.
  • Delay jitter  – fluctuations in times of arriving packets at Client side, it might cause stuttering.
  • Packet Loss rate  – because retransmission is not possible due to low latency, severe video impairments are expected.
  • E2E   – end-to-end latency is time passing between a player inputs a keystroke or moves mouse and corresponding effect appears on display (the end-to-end delay is recommended to be smaller 150ms).

 

Notes:

  • Not all cloud games are equally sensitive to latency. For Real-time Strategy (RTS) games the latency up to 1s is tolerated.  However, First Person Shooter (FPS) games, where users are shooting at a moving target tend to be more sensitive to latency with delays of over 100 ms seen as unacceptable.

 

  • The startup delay is another important factor which affects on QoE. The startup delay is the time lag from when the client joins to Cloud Gaming service and starts gaming.
  • The sub-frame decoding is desirable for Cloud Gaming to diminish end-to-end latency. What’s the sub-frame decoding – decoding starts after the first slice/tile of a frame has been received, nobody waits the last byte of the frame. Most of decoders starts decoding of a frame after the whole frame has been received.

    Here we are faced with the large-frame problem. If a frame is regular in the sense it has an expected size or smaller it’s completely received during 16.6ms (the 60 fps case), end-to-end is not impacted. However, if a frame has the size 3x above the expected then it’s received completely after 48ms, consequently the end-to-end latency is increased by 32 ms. Therefore the maximal frame size should be limited to 2x from the expected.

 

Bandwidth

Because internet bandwidth is time-varying, Server should be aware of the current bandwidth capability at Client side:

If Server transmits faster than the available client bandwidth can absorb then a congestion inevitable occurs, consequently packets are lost and a severe drop in video quality observed. If the Server transmits slower than the available bandwidth then sub-optimal video quality is produced.

Therefore it’s up to the Client to notify the Server on each change in the bandwidth. Client should repeatedly report to Server on lost or too-late-arrived packet rates (a packet arriving too late or overdue packet is just as bad as a lost packet). If packet loss events are not sporadic then Server has to reduce video bitrate respectively to match  the available bandwidth.

 

Delay Jitter

The end-to-end delay  may fluctuate from packet to packet. This variation is referred to as the delay jitter.
Delay jitter is a problem because Client must receive, decode and display frames at a constant rate, and any late frames resulting from the delay jitter can produce stuttering in game playback (whether we display such frame or discard). Due to low latency we can’t keep a large pre-roll buffer to amortize delay jitters. Buffering eliminates the delay jitter, but in Cloud Gaming where end-to-end latency is up to 100-150ms it’s challenging to amortize the jitter, therefore some stuttering might be noticeable.

Because re-transmission is not usually used in Client Gaming, the Server should not keep already-transmitted packets in a buffer, the Server works in a way – to send and to forget. 

 

Packet Losses

There are two main methods to cope with packet losses:

  • Retransmissions   –  not suitable for Cloud Gaming due to low latency
  • Error concealment and error-resilient video coding.

Error resilient video coding is usually realized by dividing each frame in self-contained slices/tiles, in such a case a packet loss inflicts a corruption of a single slice and not the whole frame. The corrupted slice can be replaced with co-located slice from the previous frame. To stop propagation of visual corruption due to temporal dependency, Client has to request I-frame or intra-refresh from Server, if I-frames are periodically transmitted then it might be sub-optimal to request additional I-frame.

Note on intra-refresh:  1/N of the MB’s in each frame are intra-coded in a predefined order, and after N frames all the MBs have been intra-coded and visual distortion has been cleaned up.

In Internet traffic “dirty” packets (i.e. packets with errors) are discarded by middle routers. The Client will either completely receive a clean packet in its entirety or completely lose a packet. Therefore techniques as FEC (Forward Error Correction) are irrelevant in Cloud Gaming due to absence of dirty packets.

Fundamental question is arisen:  What is Perceived Worse by Users: Delay or Packet Loss?

In the paper “An Evaluation of QoE in Cloud Gaming Based on Subjective Tests”,  Michael Jarschel et al.   is reported (based on subjective tests) the following:

“Delay appears to be the decisive factor in fast paced games. Players of fast games would rather accept higher packet loss rates than they would tolerate high delays..”

 

 

 

Appendix A: Gaming content has unique characteristics different from live video

Gaming content has unique characteristics (different from live video) which makes coding of this content challenging for H264 and even for HEVC:

Mixture of text and textures
Multiple layers of contents
Overlaid with a top layer displaying gaming statistics
Certain level of transparency
Layer showing gaming statistics has limited motion
Fast panning of viewport and fast zooming

 

 

Appendix B:  Temporal Masking

In the paper “Streaming DirectX-Based Games on Windows” by Alexander Franiak et al., 2012 the following observation is reported:
“we observe that the shooting game can afford lower bitrates  than the adventure game. The image quality is more important in the adventure game, where the environment contains small details are useful for the gamer …”

i think that so called Temporal Masking is the reason for the reported observation.What’s Temporal masking? Coding artifacts in fast moving objects are far less perceptible than in slowly moving ones, in other words the accuracy of the visual perception is significantly reduced when the speed of the motion is large. Blocks containing fast moving objects can be quantized more coarsely. When a scene has a lot of motion, sensitivity of HVS reduced (HVS is distracted by too much moving objects) and it’s worth to make the quantization harshly.

Shooting games are too dynamic and visual distortions are less visible, therefore we can get good quality with lower bitrate.

 

 

Appendix C: the paper “A Network Analysis on Cloud Gaming: Stadia, GeForce Now and PSNow”

According to the paper “A Network Analysis on Cloud Gaming: Stadia, GeForce Now and PSNow”, by Andrea Di Domenico et al. 2021 :

“[in Stadia] A single UDP flow carries multiple RTP streams identified by different Source Stream Identifiers (SSRC). A stream is dedicated to the video, while another one to the audio track. We also find a third flow used for video retransmission …”

Instead of interleaving of video, audio and retransmission streams within single mpegts super-stream, Stadia utilizes three separate RTP streams.

The maximal Ethernet packet size is 1522 bytes (according IEEE 802.3, bear in mind that the maximal length of UDP packet is 64KB). 

For low latency applications it’s not recommended to compose large UDP packets, e.g if you transmit mpeg2ts data, it’s not recommended to comprise more than 7 ts-packets (each packet is 188 bytes) into a single UDP. Actually large UDP packets (above 1522 bytes) are fragmented into smaller chunks, transmitted and de-fragmented at Client’s side respectively, making more delay.

 

 

 

 

Appendix D: the requirements for tolerable cloud gaming

In the paper  “Gaming in the clouds: QoE and the users’ perspective“, by Michael Jarschel et al.  the requirements for tolerable cloud gaming of racing and first-shoot-person games are present:

1) Packet loss should be no more than 1% (without re-transmission), if packet loss exceed the threshold the bitrate should be reduced.

2) End-to-End delay should be no more than 150 ms, otherwise stuttering and even lip-sync issues might occur.

3) Network jitter should be no more than 30 ms, otherwise stuttering might be noticeable.

30 Responses

  1. Hello, I think your website might be having browser compatibility issues. When I look at your blog site in Safari, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, fantastic blog!

  2. Thanx for the effort, keep up the good work Great work, I am going to start a small Blog Engine course work using your site I hope you enjoy blogging with the popular BlogEngine.net.Thethoughts you express are really awesome. Hope you will right some more posts.

  3. Hi , I do believe this is an excellent blog. I stumbled upon it on Yahoo , i will come back once again. Money and freedom is the best way to change, may you be rich and help other people.

    1. i am not rich, but i have experience above 23 years in israeli hi-tech and it’s my duty to share FREE my experience and knowledge to enable guys from poor countries to understand better such sophisticated issues in video compression and media streaming.

  4. I have read a few good stuff here. Definitely worth bookmarking for revisiting. I wonder how much effort you put to create such a excellent informative web site.

  5. Good – I should certainly pronounce, impressed with your website. I had no trouble navigating through all tabs as well as related information ended up being truly easy to do to access. I recently found what I hoped for before you know it in the least. Reasonably unusual. Is likely to appreciate it for those who add forums or anything, site theme . a tones way for your client to communicate. Excellent task..

  6. I’m curious to find out what blog platform you have been utilizing? I’m having some small security issues with my latest blog and I would like to find something more risk-free. Do you have any solutions?

  7. Good day! This is kind of off topic but I need some help from an established blog. Is it hard to set up your own blog? I’m not very techincal but I can figure things out pretty fast. I’m thinking about creating my own but I’m not sure where to start. Do you have any ideas or suggestions? Appreciate it

  8. Thank you a bunch for sharing this with all folks you really understand what you’re speaking about! Bookmarked. Please additionally seek advice from my site =). We may have a link change arrangement between us!

  9. Does your site have a contact page? I’m having problems locating it but, I’d like to shoot you an email. I’ve got some suggestions for your blog you might be interested in hearing. Either way, great website and I look forward to seeing it develop over time.

Leave a Reply

Your email address will not be published. Required fields are marked *