Spatial activity or Edge activity of frame or video sequence is important parameter for encoder’s settings (e.g. for determination of frame quantization level). On the one hand high edge acitivity in a scene means that this scene is difficult for encoding (required more bits for edge retention), on the other hand dense edge patterns masks quantization distortions and blockiness ad as a result smaller bits required.
i use OpenCV to compute Edge Activity per frame (or percentage of edge pixels per frame), edges are searched on luma pixles only. As an input the program gets:
- yuv-file (4:2:0, 8 bits, planar)
- frame width in pixels
- frama height in pixels
- downscale-flag – if this parameter is set then each yuv frame is downscaled by twice (by the OpenCV utility cvPyrDown) and the edge detection is conducted on luma pixels of the downscaled image. This parameter is useful to reduce processing time (expected negligible penalty in estimation of edge activity but processing time is found reduced by twice).
Notice that edge detection is conducted by Canny algorithm.
#include “cv.h”
#include “highgui.h”
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <stdlib.h>
#define CANNY_TH1 10
#define CANNY_TH2 255
int main(int argc, char** argv)
{
IplImage *pY;
IplImage *edges,*edges2,*pyrImg;
FILE *fin;
int picSize,uvPicSize,n,w,h,frames;
unsigned char *ybuf,*imgdata;
float activity, maxActivity,minActivity, totalActivity;
unsigned int val;
int r,c,edgeWidth, edgeHeight,edgepixels;
int downscale=0;
if(argc<4)
{
printf(“Usage: yuv-file w h downscale (optional, default 0)\n”);
return -1;
}
fin = fopen(argv[1], “rb”);
if(!fin)
{
fprintf(stderr, “failed to open yuv-file %s\n”,argv[1]);
return -1;
}
w=atoi(argv[2]);
h=atoi(argv[3]);
if(argc>=5)
downscale=atoi(argv[4]);
picSize = w*h;
uvPicSize = 2*(w>>1)*(h>>1);
pY = cvCreateImage(cvSize(w,h), IPL_DEPTH_8U, 1); // luma plane
edges = cvCreateImage(cvSize(w,h), IPL_DEPTH_8U, 1);
edges2 = cvCreateImage(cvSize(w/2,h/2), IPL_DEPTH_8U, 1); // edge mask from downscaled image
pyrImg = cvCreateImage(cvSize(w/2,h/2), IPL_DEPTH_8U, 1);
ybuf = (unsigned char*)malloc(picSize);
assert(ybuf); // sanity checks
edgeWidth = w; edgeHeight=h;
imgdata=edges->imageData;
if(downscale) {
imgdata=edges2->imageData;
edgeWidth = w/2; edgeHeight=h/2;
}
frames=0;
minActivity=1.1;
maxActivity=0.0;
totalActivity = 0.0;
while(1)
{
n=fread(ybuf,1,picSize,fin);
if(n<picSize)
break; // EOF
memcpy(&(pY->imageData[0]),ybuf, picSize);
fseek(fin,uvPicSize,SEEK_CUR);
if(downscale) {
cvPyrDown(pY,pyrImg,IPL_GAUSSIAN_5x5); // 5×5 gaussian
cvCanny(pyrImg, edges2, CANNY_TH1, CANNY_TH2, 3);
}
else cvCanny(pyrImg, edges, CANNY_TH1, CANNY_TH2, 3);
edgepixels=0;
for(r=0;r<edgeHeight;r++)
for(c=0;c<edgeWidth;c++)
{
val = (unsigned int)imgdata[r*edgeWidth+c];
if(val>64)
edgepixels++;
}
activity = (float)edgepixels/(edgeWidth*edgeHeight);
printf(“Frame %d, edge activity %.2f\n”,frames,activity);
if(activity<minActivity)
minActivity=activity;
else if(activity>maxActivity)
maxActivity=activity;
frames++;
totalActivity+=activity;
}
if(frames==0)
{
printf(“yuv file does not contain a complete frame\n”);
return -1;
}
float avgActivity = totalActivity/frames;
printf(“Number frames %d, AvgActivity %.2f\n”,frames,avgActivity);
printf(“MinActivity %.2f\n”,minActivity);
printf(“MaxActivity %.2f\n”,maxActivity);
// Free image memory
free(ybuf);
cvReleaseImage(&pY);
cvReleaseImage(&edges2);
cvReleaseImage(&edges);
cvReleaseImage(&pyrImg);
return 0;
}
The file is compiled with gcc as follows in Linux:
gcc `pkg-config –cflags opencv` -o Edge3ActivityYUV Edge3ActivityYUV.c -lopencv_calib3d -lopencv_imgproc -lopencv_contrib -lopencv_legacy -lopencv_core -lopencv_ml -lopencv_features2d -lopencv_objdetect -lopencv_flann -lopencv_video -lopencv_highgui
Note:
In the paper “Suitability of VVC and HEVC for Video Telehealth Systems”, Muhammad Arslan Usman et al, 2020 the spatial activity (or the Spatial Index SI) of a frame I is specified as follows:
SI =std [Sobel(I)]
Sobel operator is applied on the frame I and then the std (the standard deviation) of pixels is computed.
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume