Download & Build

Encode and Decode Base Layer

Encode and Decode Dual SNR

Performance Results

 

 

Introduction

Scalable video coding is coding of video in multiple layers, where each layer represents a different quality representation of the same video scene.

The base layer (BL) is the lowest quality representation. One or more enhancement layers (ELs) may be coded by referencing lower layers and provide improved video quality by exploitation inter-layer redundancy (i.e. prediction from collocated data from lower layer). 

Example of Inter-layer redundancy:  enhancement layer motion vectors and modes can be predicted from the base layer.

Scalable Video streams allow a more graceful degradation in video quality compared with non-scalable video, where reduction in bitrate typically causes more severe drops in quality. In case of network congestion Middle Box (MANE) removes a number of enhancement layers and sends the trimmed stream to Client, but MANE can’t discard upper layers at arbitrary point (or frame), only at key frames. Therefore an interval of key frames determines responsiveness latency to network condition changes.

 

HEVC/H.265 has a special scalability extension called SHVC.

Scalable video is tailored for internet streaming with changing bandwidth or network conditions.

Notice that alternative solutions like HLS or MPEG-DASH are more popular due to low complexity, but these solutions require a huge amount of disk space at Server’s side in order to keep a multitude of replicas of same sources with different bitrates and resolutions and bit widths (SDR/HDR).

In a scalable solution a video encoder generates several compressed bitstreams: a base-layer and enhancement-layers.  Each layer uses the previous one as a reference, thus inter-layer redundancy is exploited.

There are three main  types of video scalability (the figures taken from the paper “Scalable Internet video using MPEG-4” by Hayder Radha, 1999):

  • SNR-scalability  – each layer is encoded with progressively decreasing quantization step sizes, thus video equality is progressively improved. Progressive JPEG was the first practical use of SNR scalability.

SNR or Quality scalability is less time consuming on both sides encoder and decoder than the spatial, since image scaling is not    applied: 

        

Example (two-layered):  frames in enhancement layer exploit both inter-layer and    inter-frame  redundancy.

 

 

  • Spatial Scalability  – each layer is encoded with progressively increasing resolution

 

  • Temporal Scalability – each layer adds frame rate.

Example (two layered solution), enhancement layer contains non-reference B-frames which use the base layer as reference. If the base layer frame rate is 30fps then with enhancement layer the stream is 60fps.


Bit depth scalability: coding a video with different bit depths for different layers. The base layer has the lowest bit depth (usually 8bpp).

 

SHM Reference Codec

Download & Build

To download SHM SW you need install SVM (i prefer Tortoise SVM)

Then take the url of SHM:

https://hevc.hhi.fraunhofer.de/trac/shvc/browser

or clone with ‘git’:

git clone https://vcgit.hhi.fraunhofer.de/jvet/SHM.git

In the root SHM-dev folder in sub-folder ‘build’ you can find Visual-Studio solutions for different VS versions.

Compile TAppEncoder and TAppDecoder projects in x64 Release mode.

Perhaps, some changes in ‘cstdint‘ file are required to avoid errors like:  ‘uintmax_t’: is not a member of ‘`global namespace”

I use cstdint located here to build Encoder and Decoder

Encode and Decode Base Layer

Example Encode Base Layer in Low Latency Mode 

No B-frames, source is 384×320 yuv sequence of yuv420p format, frame rate 60, single-reference

Encode the base level

TAppEncoder.exe -c ipp.cfg -i0 test_384x320.yuv -o0 NUL -wdt0 384 -hgt0 320 -ip0 60 -fr0 60  -c single_layer.cfg   -f  10000 -b test_bl.h265

  • -ip0  – intra period for base layer (in our case each 60-th frame is IDR).
  • -fr0  – the frame rate of base layer
  • -f  –     number of frames to encode
  • ipp.cfg  – the main config file which specifies GOP structure (in our case IPPPP), number of references, mode of motion estimation, deblock filter parameters, switching such tools as PCM and AMP.

The content of ipp.cfg (many parameters have same sense as in HM, anyway there is documentation in https://hevc.hhi.fraunhofer.de/shvc):

  • single_layer.cfg – in addition there is another cfg-file – single_layer.cfg which specifies parameters of each layer, in the cfg-file below we specify target bitrate 1Mbps:

 

NumLayers : 1
NonHEVCBase : 0
ScalabilityMask1 : 0 # Multiview
ScalabilityMask2 : 1 # Scalable
ScalabilityMask3 : 0 # Auxiliary pictures
AdaptiveResolutionChange : 0 # Resolution change frame (0: disable)
SkipPictureAtArcSwitch : 0 # Code higher layer picture as skip at ARC switching (0: disable (default), 1: enable)
MaxTidRefPresentFlag : 1 # max_tid_ref_present_flag (0=not present, 1=present(default))
CrossLayerPictureTypeAlignFlag: 1 # Picture type alignment across layers
CrossLayerIrapAlignFlag : 1 # Align IRAP across layers
SEIpictureDigest : 0

#============= LAYER 0 ==================
QP0 : 30
MaxTidIlRefPicsPlus10 : 1 # max_tid_il_ref_pics_plus1 for layer0
#============ Rate Control ==============
RateControl0 : 1 # Rate control: enable rate control for layer 0
TargetBitrate0 : 10000000 # Rate control: target bitrate for layer 0, in bps
KeepHierarchicalBit0 : 1 # Rate control: keep hierarchical bit allocation for layer 0 in rate control algorithm
LCULevelRateControl0 : 1 # Rate control: 1: LCU level RC for layer 0; 0: picture level RC for layer 0
RCLCUSeparateModel0 : 1 # Rate control: use LCU level separate R-lambda model for layer 0
InitialQP0 : 25 # Rate control: initial QP for layer 0
RCForceIntraQP0 : 0 # Rate control: force intra QP to be equal to initial QP for layer 0

 

Example Decode Base Layer 

TAppDecoder.exe -b test_bl.h265 -o0  base_layer.yuv

-o0   output of base layer

 

You can play decoded yuv-file:

ffplay -s 384×320 base_layer.yuv

 

Encode and Decode Dual SNR

Example Encode Two SNR  Layers

Encoding two layers in SNR mode, the base layer is coded with constant QP=30 and the enhancement layer is coded with QP=20

 

TAppEncoder.exe -c ipp.cfg -i0  test_384x320.yuv -i1 testf_384x320.yuv -o0 NUL -o1 NUL -wdt0 384 -wdt1 384 -hgt0 320 -hgt1 320 -ip0 60 -ip1 60 -fr0 60 -fr1 60  -c dual_layer.cfg  -f 200 -b dual_layer.h265

 

POC    9 LId: 0 TId: 0 ( P-SLICE    TRAIL_R, nQP 31 QP 31 )       1544 bits [Y 37.5578 dB    U 42.4246 dB    V 43.4092 dB] [ET     1 ] [L0 8c ] [L1 ]

POC    9 LId: 1 TId: 0 ( P-SLICE     STSA_R, nQP 21 QP 21 )       6656 bits [Y 44.8422 dB    U 47.5345 dB    V 48.1982 dB] [ET     1 ] [L0 8 9(0, {1.00, 1.00}x)c ] [L1 ]

POC   10 LId: 0 TId: 0 ( P-SLICE    TRAIL_R, nQP 31 QP 31 )       1864 bits [Y 37.5857 dB    U 42.4312 dB    V 43.4382 dB] [ET     1 ] [L0 9c ] [L1 ]

POC   10 LId: 1 TId: 0 ( P-SLICE     STSA_R, nQP 21 QP 21 )      13376 bits [Y 44.8429 dB    U 47.5713 dB    V 48.2687 dB] [ET     1 ] [L0 9 10(0, {1.00, 1.00}x)c ] [L1 ]

POC   11 LId: 0 TId: 0 ( P-SLICE    TRAIL_R, nQP 31 QP 31 )       2400 bits [Y 37.5694 dB    U 42.4481 dB    V 43.2989 dB] [ET     1 ] [L0 10c ] [L1 ]

POC   11 LId: 1 TId: 0 ( P-SLICE     STSA_R, nQP 21 QP 21 )      12704 bits [Y 44.8439 dB    U 47.5875 dB    V 48.0735 dB] [ET     1 ] [L0 10 11(0, {1.00, 1.00}x)c ] [L1 ]

POC   12 LId: 0 TId: 0 ( P-SLICE    TRAIL_R, nQP 31 QP 31 )       2336 bits [Y 37.5577 dB    U 42.4618 dB    V 42.9799 dB] [ET     1 ] [L0 11c ] [L1 ]

POC   12 LId: 1 TId: 0 ( P-SLICE     STSA_R, nQP 21 QP 21 )      11512 bits [Y 44.8213 dB    U 47.4915 dB    V 47.9293 dB] [ET     1 ] [L0 11 12(0, {1.00, 1.00}x)c ] [L1 ]

POC   13 LId: 0 TId: 0 ( P-SLICE    TRAIL_R, nQP 31 QP 31 )       2448 bits [Y 37.5799 dB    U 42.4388 dB    V 42.6812 dB] [ET     1 ] [L0 12c ] [L1 ]

Due to SNR mode input to each layer is same: -i0  test_384x320.yuv -i1 testf_384x320.yuv

 

ipp.cfg  remains the same but instead of single-layer cfg-file we use dual_layer.cfg:

NumLayers : 2
NonHEVCBase : 0
ScalabilityMask1 : 0 # Multiview
ScalabilityMask2 : 1 # Scalable
ScalabilityMask3 : 0 # Auxiliary pictures
AdaptiveResolutionChange : 0 # Resolution change frame (0: disable)
SkipPictureAtArcSwitch : 0 # Code higher layer picture as skip at ARC switching (0: disable (default), 1: enable)
MaxTidRefPresentFlag : 1 # max_tid_ref_present_flag (0=not present, 1=present(default))
CrossLayerPictureTypeAlignFlag: 1 # Picture type alignment across layers
CrossLayerIrapAlignFlag : 1 # Align IRAP across layers
SEIpictureDigest : 0

#============= LAYER 0 ==================
QP0 : 30
MaxTidIlRefPicsPlus10 : 1 # max_tid_il_ref_pics_plus1 for layer0
#============ Rate Control ==============
RateControl0 : 0 # Rate control: enable rate control for layer 0
TargetBitrate0 : 1000000 # Rate control: target bitrate for layer 0, in bps
KeepHierarchicalBit0 : 1 # Rate control: keep hierarchical bit allocation for layer 0 in rate control algorithm
LCULevelRateControl0 : 1 # Rate control: 1: LCU level RC for layer 0; 0: picture level RC for layer 0
RCLCUSeparateModel0 : 1 # Rate control: use LCU level separate R-lambda model for layer 0
InitialQP0 : 0 # Rate control: initial QP for layer 0
RCForceIntraQP0 : 0 # Rate control: force intra QP to be equal to initial QP for layer 0

#============ WaveFront ================
WaveFrontSynchro0 : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.

#============= LAYER 1 ==================
QP1 : 20
NumSamplePredRefLayers1 : 1 # number of sample pred reference layers
SamplePredRefLayerIds1 : 0 # reference layer id
NumMotionPredRefLayers1 : 1 # number of motion pred reference layers
MotionPredRefLayerIds1 : 0 # reference layer id
NumActiveRefLayers1 : 1 # number of active reference layers
PredLayerIds1 : 0 # inter-layer prediction layer index within available reference layers

#============ Rate Control ==============
RateControl1 : 0 # Rate control: enable rate control for layer 1
TargetBitrate1 : 1000000 # Rate control: target bitrate for layer 1, in bps
KeepHierarchicalBit1 : 1 # Rate control: keep hierarchical bit allocation for layer 1 in rate control algorithm
LCULevelRateControl1 : 1 # Rate control: 1: LCU level RC for layer 1; 0: picture level RC for layer 1
RCLCUSeparateModel1 : 1 # Rate control: use LCU level separate R-lambda model for layer 1
InitialQP1 : 0 # Rate control: initial QP for layer 1
RCForceIntraQP1 : 0 # Rate control: force intra QP to be equal to initial QP for layer 1

#============ WaveFront ================
WaveFrontSynchro1 : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.

NumLayerSets : 2 # Include default layer set, value of 0 not allowed
NumLayerInIdList1 : 2 # 0-th layer set is default, need not specify LayerSetLayerIdList0 or NumLayerInIdList0
LayerSetLayerIdList1 : 0 1

NumAddLayerSets : 0
NumOutputLayerSets : 2 # Include defualt OLS, value of 0 not allowed
DefaultTargetOutputLayerIdc : 1
NumLayersInOutputLayerSet : 1 # The number of layers in the 0-th OLS should not be specified,
# ListOfOutputLayers0 need not be specified
ListOfOutputLayers1 : 1

Notes:

In Layer 1 (enhancement)  of dual_layer.cfg: 

    • NumSamplePredRefLayers1 = 1 since only one layer (the base layer is under)
    • SamplePredRefLayerIds1 = 0  reference layer for sample prediction is 0 (the base layer)
    • NumMotionPredRefLayers1 = 1   number of layers to use for motion prediction, in our case only single base layer is available
    • MotionPredRefLayerIds1 = 0   reference layer for motion data  prediction is 0 (the base layer)

 

Example Decode Layers 

The h265 file dual_layer.h265 contains two layers, to get yuv of base layer use:

TAppDecoder.exe -b  dual_layer.h265 -o0  ench_layer.yuv

to get yuv of enhancement layer use  -o1 and -ls 2

TAppDecoder.exe -b  dual_layer.h265 -ls 2 -o1  ench_layer.yuv

 

You can play decoded yuv-file:

ffplay -s 384×320 enh_layer.yuv

 

Performance Results

I measured the performance (encoding time) of SHM (Scalable HEVC)  by means of PowerShell’s Measure-Command, SNR dual layered mode.
Measure-Command {.\TAppEncoder.exe -c ipp.cfg -i0 Fifa17_1920x1080.yuv -i1 Fifa17_1920x1080.yuv -o0 NUL -o1 NUL -wdt0 1920 -wdt1 1920 -hgt0 1080 -hgt1 1080 -ip0 60 -ip1 60 -fr0 60 -fr1 60 -c dual_layer.cfg -f 5 -b snr_layer_1080p.h265}
To complete 5 of 1080p frames SHM takes about 200s, i.e. 40s per frame
Days              : 0
Hours             : 0
Minutes           : 3
Seconds           : 28

Milliseconds      : 304
Ticks             : 2083040816
TotalDays         : 0.00241092687037037
TotalHours        : 0.0578622448888889
TotalMinutes      : 3.47173469333333
TotalSeconds      : 208.3040816
TotalMilliseconds : 208304.0816

20 Responses

  1. Hello, i think that i saw you visited my website thus i came to “return the favor”.I am trying to find things to enhance my website!I suppose its ok to use some of your ideas!!

  2. F*ckin’ awesome things here. I am very glad to peer your article. Thanks a lot and i’m having a look ahead to touch you. Will you kindly drop me a e-mail?

  3. An impressive share, I just given this onto a colleague who was doing a little analysis on this. And he in fact bought me breakfast because I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you become expertise, would you mind updating your blog with more details? It is highly helpful for me. Big thumb up for this blog post!

Leave a Reply

Your email address will not be published. Required fields are marked *