Video Compression

VideoNerd

General

What’s Hierarchical B-Frame Mode or B-pyramid (notice that in my opinion B-pyramid is a bad term)?

If there is a run of B frames and some B-frames in the run are used for backward reference for some other B frames – then this mode is called Hierarchical B-Frames Coding or B-pyramid.

The following figure is taken from the paper “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF”, by Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, illustrates the conception of B-pyramid:

Let’s display the first GOP from the above figure slightly different:

So, some geometric form is revealed but not a pyramid. Therefore, in my opinion the term B-pyramid is not a good choice.

To exploit B-pyramid feature fully it’s necessary to set GOP size (in frames) to a dyadic number (2^n), e.g. gop size = 16 frames or 32 frames.

According to results of the above mentioned article “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF” using of Hierarchical B-Frames commonly improves coding efficiency (e.g. on Football CIF 30Hz, the improvement is about 0.5 Y-PSNR dB).

Pros and Cons of Hierarchical B-frames

Pros: better exploitation of temporal redundancy.

Cons: long coding latency (not suitable for low-latency applications)

How Detect Hierarchical B-Frames or B-Pyramid in Stream?

For each frame we check that all following four conditions:

  • Current frame is B

  • Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)

  • Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous B-frame is used for reference)

  • POC of current B frame is smaller than that of the previous one

      If all above conditions are met then the B-pyramid is detected.

If elementary stream is encapsulated in Mpeg-TS container then we can use PTS instead of POC. It’s worth mentioning PTS are easily picked from PES header while in case of pic_order_cnt_type=1 the derivation of POC is a complicated process.  Indeed, to parse the POC value it’s necessary to dive into SPS and pick up log2_max_pic_order_cnt_lsb  and a dozen other parameters in case of pic_order_cnt_type=1.

 

B-Pyramid versus non-reference B-frames

           What’s a gain of B-pyramid GOP structure IPbBbPbBb…. against IPbbbPbbb…. (three consecutive non-reference B-frames). Here ‘B’ denotes B-frame used for reference and ‘b’ 

           denotes B-frame not used for reference. i use x264 in constant QP mode (QP=25), closed GOP = 30 frames

          On  the testing yuv-sequence “container” (384×320, 300 frames):   the bit-size saving is ~0.7% 

          On  the testing yuv-sequence “ akiyo (384×320, 300 frames): the bit-size saving is ~1.7%

         Working with x264

         IPbbbPbbb…

         x264   –input-res 384×320 –fps 30   –b-adapt 0  –bframes 3 –b-pyramid none –ref 1 –no-scenecut –keyint 30 –min-keyint 30  –qp 25  –output  test_ibbb.h264  container_384x320.yuv

             IPbBbPbBb…

        x264   –input-res 384×320 –fps 30   –b-adapt 0  –bframes 3 –b-pyramid strict –ref 1 –no-scenecut –keyint 30 –min-keyint 30  –qp 25  –output  test_ibBb.h264  container_384x320.yuv

How Detect B-Pyramid if Elementary Stream is Encapsulated in Mpeg-TS or MPEG4 Container?

MPEG TS Container

When Elementary Stream is encapsulated in MPEG-TS container we look for video frame boundaries to pick up PTS. We get PTS from the PES header and frame start is mandatory indicated by AUD (nal_type=9) in transport packet payload. Notice that if PTS is not present then PTS=DTS and no B-pyramid can exist in such case. Picture data (or slice data in case of multiple slices per picture) is contained in NALU with nal_type = 1 or 5 (IDR). There is a possibility that slice data  is absent in the current transport packet and it’s present in the next or next-next video packet (e.g. if SPS is too long). 

Once NAL with nal_type 1 or 5 is sensed we need extract nal_ref_idc from the NAL header and two first parameters from the slice header: first_mb_in_slice and slice_type.

NAL unit of each slice consists of:

Start-code (000001 or 00000001), nal header (1 byte), slice header and slice data.

nalType = nal_header & 0x1f

nal_ref_idc =  ( nal_header & 0x60 )>>5

To determine first_mb_in_slice and slice_type we need read the first byte from the slice header  – slh[0] and to execute the following operations:

  • Get first_mb_in_slice:first_mb_in_slice = slh[0]>>7

  • if first_mb_in_slice==1 then the current slice is the first slice in a picture and it actually is the start of picture data (in such case the next step is to determine whether the slice type is B or not)

  • If first_mb_in_slice=0 then the current slice is not the first one in a picture and the picture type has been already determined.

  • if first_mb_in_slice==1 then we have to determine whether the slice type is B or not. Slice type code corresponding to B has two values 1 or 6. Exp-golomb bit-representation of 1 is ‘010’ and 6 is ‘00111’.

Hence if the current slice is corresponding to the first slice in a picture (i.e. first_mb_in_slice=1 or MSbit is ‘1’) and the picture type is B then one of the following two bit-patterns are transmitted in the first byte slh[0] of the slice:

1010     or      100111

Basing on the above patterns we derive the following rules to determine whether the picture type is B or not:

 if (slh[0]>>4)=0xA then current slice is the first slice and the picture type is B

 if ( slh[0] & 0xFC ) = 0x9C then then current slice is the first slice and the picture type is B

For each frame we check that all following four conditions:

  • Current frame is B

  • Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)

  • Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)

  • PTS of current B frame is smaller than that of the previous one

If all above conditions are met then B-pyramid is detected.

MPEG4 Container (non-fragmented)

With ‘stco’ and ‘stsz’ tables in meta-data we can access all access units successively in decoding order.

For each access unit we skip over non-VCL units (e.g. SEI) until first slice data NAL sensed (nal_type=1 or 5). 

Then we read NAL header (to determine nal_ref_idc) and the following byte (which corresponds to the first byte of slice header) to determine slice type (B or not B). Slice type and nal_ref_idc are identically determined according to the previous section.  Although ref_idc can be derived from sdtp-box provided that this box is present in meta-data (notice it’s not mandatory to signal sdtp-box).

With ctts-table in meta data we derive PTS of each access unit (if ctts is not present then PTS = DTS and no B-pyramid can exist in such stream).

For each frame we check that all following four conditions:

  • Current frame is B

  • Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)

  • Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)

  • PTS of current B frame is smaller than that of the previous one

If all above conditions are met then B-pyramid is detected.

14 Responses

  1. Good day very nice site!! Guy .. Excellent .. Superb .. I’ll bookmark your web site and take the feeds additionally?KI am satisfied to search out numerous useful info here in the put up, we want develop extra strategies in this regard, thank you for sharing. . . . . .

  2. Hello, I think your blog might be having browser compatibility issues. When I look at your website in Opera, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, great blog!

  3. With havin so much content and articles do you ever run into any problems of plagorism or copyright violation? My website has a lot of unique content I’ve either authored myself or outsourced but it looks like a lot of it is popping it up all over the web without my permission. Do you know any solutions to help prevent content from being stolen? I’d certainly appreciate it.

    1. if i add a figure from an external paper, i always put the reference to this paper.
      Although some ideas in my website might coincide with ideas published in technical literature. If such event occurs it’s non-deliberately.

  4. certainly like your web-site however you have to test the spelling on several of your posts. Many of them are rife with spelling problems and I find it very bothersome to tell the truth on the other hand I’ll certainly come again again.

    1. to improve spelling i need hire a technical writer and i have not money for this task.
      This site is non-profit with the purpose to enable people from poor countries to be familiar with modern technologies in video compression and streaming

Leave a Reply

Your email address will not be published. Required fields are marked *