General
What’s Hierarchical B-Frame Mode or B-pyramid (notice that in my opinion B-pyramid is a bad term)?
If there is a run of B frames and some B-frames in the run are used for backward reference for some other B frames – then this mode is called Hierarchical B-Frames Coding or B-pyramid.
The following figure is taken from the paper “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF”, by Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, illustrates the conception of B-pyramid:
Let’s display the first GOP from the above figure slightly different:
So, some geometric form is revealed but not a pyramid. Therefore, in my opinion the term B-pyramid is not a good choice.
To exploit B-pyramid feature fully it’s necessary to set GOP size (in frames) to a dyadic number (2^n), e.g. gop size = 16 frames or 32 frames.
According to results of the above mentioned article “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF” using of Hierarchical B-Frames commonly improves coding efficiency (e.g. on Football CIF 30Hz, the improvement is about 0.5 Y-PSNR dB).
Pros and Cons of Hierarchical B-frames
Pros: better exploitation of temporal redundancy.
Cons: long coding latency (not suitable for low-latency applications)
How Detect Hierarchical B-Frames or B-Pyramid in Stream?
For each frame we check that all following four conditions:
-
Current frame is B
-
Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)
-
Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous B-frame is used for reference)
-
POC of current B frame is smaller than that of the previous one
If all above conditions are met then the B-pyramid is detected.
If elementary stream is encapsulated in Mpeg-TS container then we can use PTS instead of POC. It’s worth mentioning PTS are easily picked from PES header while in case of pic_order_cnt_type=1 the derivation of POC is a complicated process. Indeed, to parse the POC value it’s necessary to dive into SPS and pick up log2_max_pic_order_cnt_lsb and a dozen other parameters in case of pic_order_cnt_type=1.
B-Pyramid versus non-reference B-frames
What’s a gain of B-pyramid GOP structure IPbBbPbBb…. against IPbbbPbbb…. (three consecutive non-reference B-frames). Here ‘B’ denotes B-frame used for reference and ‘b’
denotes B-frame not used for reference. i use x264 in constant QP mode (QP=25), closed GOP = 30 frames
On the testing yuv-sequence “container” (384×320, 300 frames): the bit-size saving is ~0.7%
On the testing yuv-sequence “ akiyo“ (384×320, 300 frames): the bit-size saving is ~1.7%
Working with x264
IPbbbPbbb…
x264 –input-res 384×320 –fps 30 –b-adapt 0 –bframes 3 –b-pyramid none –ref 1 –no-scenecut –keyint 30 –min-keyint 30 –qp 25 –output test_ibbb.h264 container_384x320.yuv
IPbBbPbBb…
x264 –input-res 384×320 –fps 30 –b-adapt 0 –bframes 3 –b-pyramid strict –ref 1 –no-scenecut –keyint 30 –min-keyint 30 –qp 25 –output test_ibBb.h264 container_384x320.yuv
How Detect B-Pyramid if Elementary Stream is Encapsulated in Mpeg-TS or MPEG4 Container?
MPEG TS Container
When Elementary Stream is encapsulated in MPEG-TS container we look for video frame boundaries to pick up PTS. We get PTS from the PES header and frame start is mandatory indicated by AUD (nal_type=9) in transport packet payload. Notice that if PTS is not present then PTS=DTS and no B-pyramid can exist in such case. Picture data (or slice data in case of multiple slices per picture) is contained in NALU with nal_type = 1 or 5 (IDR). There is a possibility that slice data is absent in the current transport packet and it’s present in the next or next-next video packet (e.g. if SPS is too long).
Once NAL with nal_type 1 or 5 is sensed we need extract nal_ref_idc from the NAL header and two first parameters from the slice header: first_mb_in_slice and slice_type.
NAL unit of each slice consists of:
Start-code (000001 or 00000001), nal header (1 byte), slice header and slice data.
nalType = nal_header & 0x1f
nal_ref_idc = ( nal_header & 0x60 )>>5
To determine first_mb_in_slice and slice_type we need read the first byte from the slice header – slh[0] and to execute the following operations:
-
Get first_mb_in_slice:first_mb_in_slice = slh[0]>>7
-
if first_mb_in_slice==1 then the current slice is the first slice in a picture and it actually is the start of picture data (in such case the next step is to determine whether the slice type is B or not)
-
If first_mb_in_slice=0 then the current slice is not the first one in a picture and the picture type has been already determined.
-
if first_mb_in_slice==1 then we have to determine whether the slice type is B or not. Slice type code corresponding to B has two values 1 or 6. Exp-golomb bit-representation of 1 is ‘010’ and 6 is ‘00111’.
Hence if the current slice is corresponding to the first slice in a picture (i.e. first_mb_in_slice=1 or MSbit is ‘1’) and the picture type is B then one of the following two bit-patterns are transmitted in the first byte slh[0] of the slice:
1010 or 100111
Basing on the above patterns we derive the following rules to determine whether the picture type is B or not:
if (slh[0]>>4)=0xA then current slice is the first slice and the picture type is B
if ( slh[0] & 0xFC ) = 0x9C then then current slice is the first slice and the picture type is B
For each frame we check that all following four conditions:
-
Current frame is B
-
Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)
-
Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)
-
PTS of current B frame is smaller than that of the previous one
If all above conditions are met then B-pyramid is detected.
MPEG4 Container (non-fragmented)
With ‘stco’ and ‘stsz’ tables in meta-data we can access all access units successively in decoding order.
For each access unit we skip over non-VCL units (e.g. SEI) until first slice data NAL sensed (nal_type=1 or 5).
Then we read NAL header (to determine nal_ref_idc) and the following byte (which corresponds to the first byte of slice header) to determine slice type (B or not B). Slice type and nal_ref_idc are identically determined according to the previous section. Although ref_idc can be derived from sdtp-box provided that this box is present in meta-data (notice it’s not mandatory to signal sdtp-box).
With ctts-table in meta data we derive PTS of each access unit (if ctts is not present then PTS = DTS and no B-pyramid can exist in such stream).
For each frame we check that all following four conditions:
-
Current frame is B
-
Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)
-
Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)
-
PTS of current B frame is smaller than that of the previous one
If all above conditions are met then B-pyramid is detected.
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume
Good day very nice site!! Guy .. Excellent .. Superb .. I’ll bookmark your web site and take the feeds additionally?KI am satisfied to search out numerous useful info here in the put up, we want develop extra strategies in this regard, thank you for sharing. . . . . .
Hello, I think your blog might be having browser compatibility issues. When I look at your website in Opera, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, great blog!
I believe this site contains some real great information for everyone. “As we grow oldthe beauty steals inward.” by Ralph Waldo Emerson.
Lovely just what I was looking for.Thanks to the author for taking his time on this one.
I haven?¦t checked in here for some time because I thought it was getting boring, but the last several posts are great quality so I guess I?¦ll add you back to my everyday bloglist. You deserve it my friend 🙂
With havin so much content and articles do you ever run into any problems of plagorism or copyright violation? My website has a lot of unique content I’ve either authored myself or outsourced but it looks like a lot of it is popping it up all over the web without my permission. Do you know any solutions to help prevent content from being stolen? I’d certainly appreciate it.
if i add a figure from an external paper, i always put the reference to this paper.
Although some ideas in my website might coincide with ideas published in technical literature. If such event occurs it’s non-deliberately.
certainly like your web-site however you have to test the spelling on several of your posts. Many of them are rife with spelling problems and I find it very bothersome to tell the truth on the other hand I’ll certainly come again again.
to improve spelling i need hire a technical writer and i have not money for this task.
This site is non-profit with the purpose to enable people from poor countries to be familiar with modern technologies in video compression and streaming
Wohh exactly what I was searching for, thanks for putting up.
Hi, Neat post. There is a problem with your website in internet explorer, would check this… IE still is the market leader and a huge portion of people will miss your great writing because of this problem.
Thanks for another informative blog. Where else could I get that type of info written in such a perfect way? I’ve a project that I am just now working on, and I have been on the look out for such information.
I am glad to be a visitant of this pure site! , appreciate it for this rare information! .
I think you have mentioned some very interesting points, thanks for the post.