Introduction
Scalable video coding is coding of video in multiple layers, where each layer represents a different quality representation of the same video scene.
The base layer (BL) is the lowest quality representation. One or more enhancement layers (ELs) may be coded by referencing lower layers and provide improved video quality by exploitation inter-layer redundancy (i.e. prediction from collocated data from lower layer).
Example of Inter-layer redundancy: enhancement layer motion vectors and modes can be predicted from the base layer.
Scalable Video streams allow a more graceful degradation in video quality compared with non-scalable video, where reduction in bitrate typically causes more severe drops in quality. In case of network congestion Middle Box (MANE) removes a number of enhancement layers and sends the trimmed stream to Client, but MANE can’t discard upper layers at arbitrary point (or frame), only at key frames. Therefore an interval of key frames determines responsiveness latency to network condition changes.
HEVC/H.265 has a special scalability extension called SHVC.
Scalable video is tailored for internet streaming with changing bandwidth or network conditions.
Notice that alternative solutions like HLS or MPEG-DASH are more popular due to low complexity, but these solutions require a huge amount of disk space at Server’s side in order to keep a multitude of replicas of same sources with different bitrates and resolutions and bit widths (SDR/HDR).
In a scalable solution a video encoder generates several compressed bitstreams: a base-layer and enhancement-layers. Each layer uses the previous one as a reference, thus inter-layer redundancy is exploited.
There are three main types of video scalability (the figures taken from the paper “Scalable Internet video using MPEG-4” by Hayder Radha, 1999):
- SNR-scalability – each layer is encoded with progressively decreasing quantization step sizes, thus video equality is progressively improved. Progressive JPEG was the first practical use of SNR scalability.
SNR or Quality scalability is less time consuming on both sides encoder and decoder than the spatial, since image scaling is not applied:
Example (two-layered): frames in enhancement layer exploit both inter-layer and inter-frame redundancy.
- Spatial Scalability – each layer is encoded with progressively increasing resolution
- Temporal Scalability – each layer adds frame rate.
Example (two layered solution), enhancement layer contains non-reference B-frames which use the base layer as reference. If the base layer frame rate is 30fps then with enhancement layer the stream is 60fps.
• Bit depth scalability: coding a video with different bit depths for different layers. The base layer has the lowest bit depth (usually 8bpp).
SHM Reference Codec
Download & Build
To download SHM SW you need install SVM (i prefer Tortoise SVM)
Then take the url of SHM:
https://hevc.hhi.fraunhofer.de/trac/shvc/browser
or clone with ‘git’:
git clone https://vcgit.hhi.fraunhofer.de/jvet/SHM.git
In the root SHM-dev folder in sub-folder ‘build’ you can find Visual-Studio solutions for different VS versions.
Compile TAppEncoder and TAppDecoder projects in x64 Release mode.
Perhaps, some changes in ‘cstdint‘ file are required to avoid errors like: ‘uintmax_t’: is not a member of ‘`global namespace”
I use cstdint located here to build Encoder and Decoder
Encode and Decode Base Layer
Example Encode Base Layer in Low Latency Mode
No B-frames, source is 384×320 yuv sequence of yuv420p format, frame rate 60, single-reference
Encode the base level
TAppEncoder.exe -c ipp.cfg -i0 test_384x320.yuv -o0 NUL -wdt0 384 -hgt0 320 -ip0 60 -fr0 60 -c single_layer.cfg -f 10000 -b test_bl.h265
-ip0
– intra period for base layer (in our case each 60-th frame is IDR).-fr0
– the frame rate of base layer-f
– number of frames to encode
- ipp.cfg – the main config file which specifies GOP structure (in our case IPPPP), number of references, mode of motion estimation, deblock filter parameters, switching such tools as PCM and AMP.
The content of ipp.cfg (many parameters have same sense as in HM, anyway there is documentation in https://hevc.hhi.fraunhofer.de/shvc):
- single_layer.cfg – in addition there is another cfg-file – single_layer.cfg which specifies parameters of each layer, in the cfg-file below we specify target bitrate 1Mbps:
NumLayers : 1
NonHEVCBase : 0
ScalabilityMask1 : 0 # Multiview
ScalabilityMask2 : 1 # Scalable
ScalabilityMask3 : 0 # Auxiliary pictures
AdaptiveResolutionChange : 0 # Resolution change frame (0: disable)
SkipPictureAtArcSwitch : 0 # Code higher layer picture as skip at ARC switching (0: disable (default), 1: enable)
MaxTidRefPresentFlag : 1 # max_tid_ref_present_flag (0=not present, 1=present(default))
CrossLayerPictureTypeAlignFlag: 1 # Picture type alignment across layers
CrossLayerIrapAlignFlag : 1 # Align IRAP across layers
SEIpictureDigest : 0
#============= LAYER 0 ==================
QP0 : 30
MaxTidIlRefPicsPlus10 : 1 # max_tid_il_ref_pics_plus1 for layer0
#============ Rate Control ==============
RateControl0 : 1 # Rate control: enable rate control for layer 0
TargetBitrate0 : 10000000 # Rate control: target bitrate for layer 0, in bps
KeepHierarchicalBit0 : 1 # Rate control: keep hierarchical bit allocation for layer 0 in rate control algorithm
LCULevelRateControl0 : 1 # Rate control: 1: LCU level RC for layer 0; 0: picture level RC for layer 0
RCLCUSeparateModel0 : 1 # Rate control: use LCU level separate R-lambda model for layer 0
InitialQP0 : 25 # Rate control: initial QP for layer 0
RCForceIntraQP0 : 0 # Rate control: force intra QP to be equal to initial QP for layer 0
Example Decode Base Layer
TAppDecoder.exe -b test_bl.h265 -o0 base_layer.yuv
-o0 output of base layer
You can play decoded yuv-file:
ffplay -s 384×320 base_layer.yuv
Encode and Decode Dual SNR
Example Encode Two SNR Layers
Encoding two layers in SNR mode, the base layer is coded with constant QP=30 and the enhancement layer is coded with QP=20
TAppEncoder.exe -c ipp.cfg -i0 test_384x320.yuv -i1 testf_384x320.yuv -o0 NUL -o1 NUL -wdt0 384 -wdt1 384 -hgt0 320 -hgt1 320 -ip0 60 -ip1 60 -fr0 60 -fr1 60 -c dual_layer.cfg -f 200 -b dual_layer.h265
POC 9 LId: 0 TId: 0 ( P-SLICE TRAIL_R, nQP 31 QP 31 ) 1544 bits [Y 37.5578 dB U 42.4246 dB V 43.4092 dB] [ET 1 ] [L0 8c ] [L1 ]
POC 9 LId: 1 TId: 0 ( P-SLICE STSA_R, nQP 21 QP 21 ) 6656 bits [Y 44.8422 dB U 47.5345 dB V 48.1982 dB] [ET 1 ] [L0 8 9(0, {1.00, 1.00}x)c ] [L1 ]
POC 10 LId: 0 TId: 0 ( P-SLICE TRAIL_R, nQP 31 QP 31 ) 1864 bits [Y 37.5857 dB U 42.4312 dB V 43.4382 dB] [ET 1 ] [L0 9c ] [L1 ]
POC 10 LId: 1 TId: 0 ( P-SLICE STSA_R, nQP 21 QP 21 ) 13376 bits [Y 44.8429 dB U 47.5713 dB V 48.2687 dB] [ET 1 ] [L0 9 10(0, {1.00, 1.00}x)c ] [L1 ]
POC 11 LId: 0 TId: 0 ( P-SLICE TRAIL_R, nQP 31 QP 31 ) 2400 bits [Y 37.5694 dB U 42.4481 dB V 43.2989 dB] [ET 1 ] [L0 10c ] [L1 ]
POC 11 LId: 1 TId: 0 ( P-SLICE STSA_R, nQP 21 QP 21 ) 12704 bits [Y 44.8439 dB U 47.5875 dB V 48.0735 dB] [ET 1 ] [L0 10 11(0, {1.00, 1.00}x)c ] [L1 ]
POC 12 LId: 0 TId: 0 ( P-SLICE TRAIL_R, nQP 31 QP 31 ) 2336 bits [Y 37.5577 dB U 42.4618 dB V 42.9799 dB] [ET 1 ] [L0 11c ] [L1 ]
POC 12 LId: 1 TId: 0 ( P-SLICE STSA_R, nQP 21 QP 21 ) 11512 bits [Y 44.8213 dB U 47.4915 dB V 47.9293 dB] [ET 1 ] [L0 11 12(0, {1.00, 1.00}x)c ] [L1 ]
POC 13 LId: 0 TId: 0 ( P-SLICE TRAIL_R, nQP 31 QP 31 ) 2448 bits [Y 37.5799 dB U 42.4388 dB V 42.6812 dB] [ET 1 ] [L0 12c ] [L1 ]
Due to SNR mode input to each layer is same: -i0 test_384x320.yuv -i1 testf_384x320.yuv
ipp.cfg remains the same but instead of single-layer cfg-file we use dual_layer.cfg:
NumLayers : 2
NonHEVCBase : 0
ScalabilityMask1 : 0 # Multiview
ScalabilityMask2 : 1 # Scalable
ScalabilityMask3 : 0 # Auxiliary pictures
AdaptiveResolutionChange : 0 # Resolution change frame (0: disable)
SkipPictureAtArcSwitch : 0 # Code higher layer picture as skip at ARC switching (0: disable (default), 1: enable)
MaxTidRefPresentFlag : 1 # max_tid_ref_present_flag (0=not present, 1=present(default))
CrossLayerPictureTypeAlignFlag: 1 # Picture type alignment across layers
CrossLayerIrapAlignFlag : 1 # Align IRAP across layers
SEIpictureDigest : 0
#============= LAYER 0 ==================
QP0 : 30
MaxTidIlRefPicsPlus10 : 1 # max_tid_il_ref_pics_plus1 for layer0
#============ Rate Control ==============
RateControl0 : 0 # Rate control: enable rate control for layer 0
TargetBitrate0 : 1000000 # Rate control: target bitrate for layer 0, in bps
KeepHierarchicalBit0 : 1 # Rate control: keep hierarchical bit allocation for layer 0 in rate control algorithm
LCULevelRateControl0 : 1 # Rate control: 1: LCU level RC for layer 0; 0: picture level RC for layer 0
RCLCUSeparateModel0 : 1 # Rate control: use LCU level separate R-lambda model for layer 0
InitialQP0 : 0 # Rate control: initial QP for layer 0
RCForceIntraQP0 : 0 # Rate control: force intra QP to be equal to initial QP for layer 0
#============ WaveFront ================
WaveFrontSynchro0 : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#============= LAYER 1 ==================
QP1 : 20
NumSamplePredRefLayers1 : 1 # number of sample pred reference layers
SamplePredRefLayerIds1 : 0 # reference layer id
NumMotionPredRefLayers1 : 1 # number of motion pred reference layers
MotionPredRefLayerIds1 : 0 # reference layer id
NumActiveRefLayers1 : 1 # number of active reference layers
PredLayerIds1 : 0 # inter-layer prediction layer index within available reference layers
#============ Rate Control ==============
RateControl1 : 0 # Rate control: enable rate control for layer 1
TargetBitrate1 : 1000000 # Rate control: target bitrate for layer 1, in bps
KeepHierarchicalBit1 : 1 # Rate control: keep hierarchical bit allocation for layer 1 in rate control algorithm
LCULevelRateControl1 : 1 # Rate control: 1: LCU level RC for layer 1; 0: picture level RC for layer 1
RCLCUSeparateModel1 : 1 # Rate control: use LCU level separate R-lambda model for layer 1
InitialQP1 : 0 # Rate control: initial QP for layer 1
RCForceIntraQP1 : 0 # Rate control: force intra QP to be equal to initial QP for layer 1
#============ WaveFront ================
WaveFrontSynchro1 : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
NumLayerSets : 2 # Include default layer set, value of 0 not allowed
NumLayerInIdList1 : 2 # 0-th layer set is default, need not specify LayerSetLayerIdList0 or NumLayerInIdList0
LayerSetLayerIdList1 : 0 1
NumAddLayerSets : 0
NumOutputLayerSets : 2 # Include defualt OLS, value of 0 not allowed
DefaultTargetOutputLayerIdc : 1
NumLayersInOutputLayerSet : 1 # The number of layers in the 0-th OLS should not be specified,
# ListOfOutputLayers0 need not be specified
ListOfOutputLayers1 : 1
Notes:
In Layer 1 (enhancement) of dual_layer.cfg:
-
- NumSamplePredRefLayers1 = 1 since only one layer (the base layer is under)
-
- SamplePredRefLayerIds1 = 0 reference layer for sample prediction is 0 (the base layer)
-
- NumMotionPredRefLayers1 = 1 number of layers to use for motion prediction, in our case only single base layer is available
-
- MotionPredRefLayerIds1 = 0 reference layer for motion data prediction is 0 (the base layer)
Example Decode Layers
The h265 file dual_layer.h265 contains two layers, to get yuv of base layer use:
TAppDecoder.exe -b dual_layer.h265 -o0 ench_layer.yuv
to get yuv of enhancement layer use -o1
and -ls 2
TAppDecoder.exe -b dual_layer.h265 -ls 2 -o1 ench_layer.yuv
You can play decoded yuv-file:
ffplay -s 384×320 enh_layer.yuv
Performance Results
Measure-Command {.\TAppEncoder.exe -c ipp.cfg -i0 Fifa17_1920x1080. yuv -i1 Fifa17_1920x1080. yuv -o0 NUL -o1 NUL -wdt0 1920 -wdt1 1920 -hgt0 1080 -hgt1 1080 -ip0 60 -ip1 60 -fr0 60 -fr1 60 -c dual_layer.cfg -f 5 -b snr_layer_1080p.h265}
Hours : 0
Minutes : 3
Seconds : 28
Milliseconds : 304
Ticks : 2083040816
TotalDays : 0.00241092687037037
TotalHours : 0.0578622448888889
TotalMinutes : 3.47173469333333
TotalSeconds : 208.3040816
TotalMilliseconds : 208304.0816
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume
I truly treasure your work, Great post.
Hello, i think that i saw you visited my website thus i came to “return the favor”.I am trying to find things to enhance my website!I suppose its ok to use some of your ideas!!
welcomed
I very glad to find this web site on bing, just what I was looking for : D also saved to fav.
Some truly terrific work on behalf of the owner of this site, dead great written content.
I’d constantly want to be update on new articles on this web site, saved to bookmarks! .
F*ckin’ awesome things here. I am very glad to peer your article. Thanks a lot and i’m having a look ahead to touch you. Will you kindly drop me a e-mail?
slavah264@gmail.com
Amazing blog! Is your theme custom made or did you download it from somewhere? A design like yours with a few simple adjustements would really make my blog stand out. Please let me know where you got your theme. With thanks
i did a project related to satellite communication (military) using scalable video.
I am always invstigating online for articles that can benefit me. Thank you!
I am extremely inspired with your writing abilities as smartly as with the format on your weblog. Is that this a paid theme or did you customize it your self? Anyway keep up the nice quality writing, it’s rare to peer a nice weblog like this one today..
excellent post.Ne’er knew this, appreciate it for letting me know.
Spot on with this write-up, I actually think this web site wants way more consideration. I’ll most likely be again to read far more, thanks for that info.
Regards for helping out, great information.
This is very attention-grabbing, You are an overly skilled blogger. I’ve joined your feed and look forward to in the hunt for more of your magnificent post. Also, I’ve shared your website in my social networks!
Enjoyed reading through this, very good stuff, thanks.
Hello there, I found your website by the use of Google whilst searching for a comparable matter, your web site got here up, it seems to be good. I have bookmarked it in my google bookmarks.
Keep working ,fantastic job!
An impressive share, I just given this onto a colleague who was doing a little analysis on this. And he in fact bought me breakfast because I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you become expertise, would you mind updating your blog with more details? It is highly helpful for me. Big thumb up for this blog post!