Video Compression

VideoNerd

Spatial activity or Edge activity of frame or video sequence is important parameter for encoder’s settings (e.g. for determination of frame quantization level). On the one hand high edge acitivity in a scene means that this scene is difficult for encoding (required more bits for edge retention), on the other hand dense edge patterns masks quantization distortions and blockiness ad as a result smaller bits required.

i use OpenCV to compute Edge Activity per frame (or percentage of edge pixels per frame), edges are searched on luma pixles only. As an input the program gets:

  • yuv-file (4:2:0, 8 bits, planar)
  • frame width in pixels
  • frama height in pixels
  • downscale-flag –  if this parameter is set then each yuv frame is downscaled by twice (by the OpenCV utility cvPyrDown) and the edge detection is conducted on luma pixels of the downscaled image. This parameter is useful to reduce processing time (expected negligible penalty in estimation of edge activity but processing time is found reduced by twice).

Notice that edge detection is conducted by Canny algorithm.

#include “cv.h”
#include “highgui.h”
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <stdlib.h>   

#define CANNY_TH1  10
#define CANNY_TH2  255

int main(int argc, char** argv)
{
IplImage *pY;
IplImage *edges,*edges2,*pyrImg;
FILE *fin;
int picSize,uvPicSize,n,w,h,frames;
unsigned char *ybuf,*imgdata;
float activity, maxActivity,minActivity, totalActivity;
unsigned int val;
int r,c,edgeWidth, edgeHeight,edgepixels;
int downscale=0;

  if(argc<4)
{
printf(“Usage:  yuv-file  w  h  downscale (optional, default 0)\n”);
return -1;
}

fin = fopen(argv[1], “rb”);
if(!fin)
{
fprintf(stderr, “failed to open yuv-file %s\n”,argv[1]);
return -1;
}

  w=atoi(argv[2]);
h=atoi(argv[3]);
if(argc>=5)
downscale=atoi(argv[4]);

  picSize = w*h;
uvPicSize = 2*(w>>1)*(h>>1);
pY = cvCreateImage(cvSize(w,h), IPL_DEPTH_8U, 1); //  luma plane

  edges = cvCreateImage(cvSize(w,h), IPL_DEPTH_8U, 1);
  edges2 = cvCreateImage(cvSize(w/2,h/2), IPL_DEPTH_8U, 1);  // edge mask from downscaled image
pyrImg = cvCreateImage(cvSize(w/2,h/2), IPL_DEPTH_8U, 1);
ybuf = (unsigned char*)malloc(picSize);
assert(ybuf);   // sanity checks

 

 edgeWidth = w; edgeHeight=h;

 imgdata=edges->imageData;

if(downscale) {
imgdata=edges2->imageData;

     edgeWidth = w/2; edgeHeight=h/2;

 }

  frames=0;
minActivity=1.1;
maxActivity=0.0;
totalActivity = 0.0;

while(1)
{
n=fread(ybuf,1,picSize,fin);
if(n<picSize)
break;  // EOF

    memcpy(&(pY->imageData[0]),ybuf, picSize);
fseek(fin,uvPicSize,SEEK_CUR);

if(downscale) {
cvPyrDown(pY,pyrImg,IPL_GAUSSIAN_5x5); // 5×5 gaussian
         cvCanny(pyrImg, edges2, CANNY_TH1, CANNY_TH2, 3);

   }

 else   cvCanny(pyrImg, edges, CANNY_TH1, CANNY_TH2, 3);

    edgepixels=0;
for(r=0;r<edgeHeight;r++)
for(c=0;c<edgeWidth;c++)
{
val = (unsigned int)imgdata[r*edgeWidth+c];
if(val>64)
edgepixels++;
}

activity = (float)edgepixels/(edgeWidth*edgeHeight);
printf(“Frame %d, edge activity %.2f\n”,frames,activity);

    if(activity<minActivity)
minActivity=activity;
else if(activity>maxActivity)
maxActivity=activity;

    frames++;
totalActivity+=activity;
}

   if(frames==0)
{
printf(“yuv file does not contain a complete frame\n”);
return -1;
}

float avgActivity = totalActivity/frames;
printf(“Number frames %d, AvgActivity %.2f\n”,frames,avgActivity);
printf(“MinActivity %.2f\n”,minActivity);
printf(“MaxActivity %.2f\n”,maxActivity);

// Free image memory
free(ybuf);
cvReleaseImage(&pY);
cvReleaseImage(&edges2);

cvReleaseImage(&edges);
cvReleaseImage(&pyrImg);

  return 0;


}

 

The file is compiled with gcc as follows in Linux:

gcc  `pkg-config –cflags opencv` -o Edge3ActivityYUV  Edge3ActivityYUV.c  -lopencv_calib3d -lopencv_imgproc -lopencv_contrib -lopencv_legacy -lopencv_core -lopencv_ml -lopencv_features2d -lopencv_objdetect -lopencv_flann -lopencv_video -lopencv_highgui

 

Note:

In the paper “Suitability of VVC and HEVC for Video Telehealth Systems”, Muhammad Arslan Usman et al, 2020 the spatial activity (or the Spatial Index SI) of a frame I is specified as follows:

   SI =std [Sobel(I)]

Sobel operator is applied on the frame I and then the std (the standard deviation of pixels is computed.

Leave a Reply

Your email address will not be published. Required fields are marked *