Content
Sensitivity to low-frequency components
Visual Cortex, Lateral Geniculate Nucleus etc.
HVS is capable of perceiving the light approximately at contrast ratio of 10^5:1
There are two types of photoreceptors – rods and cones
Why practical profiles of most video compression standards require YUV (or YCbCr) 4:2:0?
Visibility of coding artifacts AROUND a scene cut
The human eye does not absorb the entire visual stimulus at the same resolution
The color we assign an object depends …
Brightness is not always proportionate to the intensity of light entering the eye
Critical Fusion Frequency (CFF)
JND (Just Noticeable Distortion)
End-to-end visual communication system:
taken from “A Survey on Perceptually Optimized Video Coding”, YUN ZHANG et al.,2022
1. It is well known that the human visual system (HVS) is less sensitive to distortion of high-frequency components than that of low-frequency components. This property has been utilized in video coding methods. The quantization step size used for quantizing a DCT coefficient increases as the frequency of that coefficient increases. HEVC and H264 support the custom quantization matrices which can be chosen such that high-frequency DCT coefficients are coarsely quantized and low frequency coefficients are quantized more fine.
2. Visual processing in Primates is mainly performed by Visual Cortex and by Lateral Geniculate Nucleus (parts of brain). Eyes are sensors with simple preprocessing functions, all preprocessing (e.g. edge enhancement) is executed by ganglion cells in retina. By the way preprocessing by ganglion cells is an excellent example of distributed computation.
Recognition of primitive shapes (e.g. circle, triangle) are carried out in the brain, in columnar cells of cerebral cortex. Each column of cells is responsible for detection of its own shape, within the column the processing is serial.
HVS-related video compression is based on elimination of “something”, which is eliminated by human retina anyway. If we remove more than eliminated by retina then visual impairments are observed. If we remove less then the compression ratio is low and hence the compression is ineffective.
3. HVS is capable of perceiving the light approximately at contrast ratio of 10^5:1 simultaneously in one scene . This range is far beyond the dynamic range that the majority of existing capturing and display devices are capable of providing. Presently, the vast majority of existing consumer cameras and display devices are able to support Low Dynamic Range (LDR) video content with contrast ratio of approximately 100:1 to 1000:1.
4. There are two types of photoreceptors: rods and cones. Rods are sensitive to low light levels; they are unable to distinguish color and are predominant in the periphery. Cones, on the other hand, are sensitive to higher light levels of long, medium, and short wavelengths. They form the basis of color perception. Cone cells are mostly concentrated in the center region of the retina, called the fovea. The number of the rods, about 100 million, is higher by more than an order of magnitude compared to the number of cones, which is about 6.5 million.
However most popular chroma-subsampling is 4:2:0 (4 luma pixels and 2 chroma pixels), the ratio luma pixels to chroma ones is 2:1. Why 8:2:0 with luma/chroma ratio 4:1 is not popular?
5. Why practical profiles of most video compression standards require YUV (or YCbCr) 4:2:0? The human visual system (HVS) is more sensitive to structure and pattern (i.e. to luminance) than it is to color. Thus, it makes sense to keep luma pixels with a higher fidelity than chroma ones. Therefore the process of the chroma subsampling is applied. The most common schema of chroma subsumpling is to subsample the chroma channels by a factor of two in each dimension – 4:2:0 format (one Cb and one Cr samples for every four luma samples).
6. According to the paper “Visual masking at video scene cuts”, by W.J. Tam et al. , the visibility of coding artifacts AROUND a scene cut is significantly reduced (masked), but in the first subsequent frame and in the previous frame. The reduction in the visibility of visual impairments after a scene cut is called “forward masking” (a similar effect is observed in audio perception too).
In addition to the forward masking at scene cuts another unexpected phenomenon called “backward masking” is observed: the visibility of coding artifacts at the frame before a scene cut is significantly reduced (by the way, a similar backward masking is observed in audio perception). The backward masking may be explained as the result of the variation in the latency of the neural signals in the visual system.
Many coding artifacts in the complex regions such as tree leaves are less visible than those in the uniform regions such as the sky. The same amount of random noise is added to the areas with different frequency distribution backgrounds is differently noticed. The noise added to flat (low frequency) background is much more visible than that added to texture (high frequency) background:
taken from the paper “A Human Visual System-Based Objective Video Distortion Measurement System”, Zhou Wang and Alan C. Bovik
Note: distortion in regions with regular pattern, such as parallel lines, is more perceivable than that in chaos textural regions, such as grasses, according to the paper “A Survey on Perceptually Optimized Video Coding”, YUN ZHANG et al.,2022
8. Scene Cut Masking. The ability of the human visual system to notice coding artifacts is significantly reduced after a scene cut (i.e. at abrupt temporal decorrelation). The first pictures of the new scene can be quantized more harshly without compromising visual quality.
According to the paper “Visual masking at video scene cuts”, by W.J. Tam et al. (which itself based on earlier reports) , the visibility of coding artifacts AROUND a scene cut is significantly reduced (masked): in the first subsequent frame and in the previous frame.The reduction in the visibility of visual impairments after a scene cut is called “forward masking” (a similar effect is observed in audio perception too). In addition to the forward masking at scene cuts another unexpected phenomenon called “backward masking” is observed: the visibility of coding artifacts at the frame before a scene cut is significantly reduced (by the way, a similar backward masking is observed in audio perception). The backward masking may be explained as video frames are buffered in someway, otherwise the backward masking contradicts to the causality, a scene cut occurring after the backward frame, nevertheless it affects the perception of the backward frame.
According to the M.A. thesis “Visual Temporal Masking at Video Scene Cuts“, by Carol English, 1997, visual masking is observed at three frames from each side of a scene cut, but the masking strength was found to vary with image content. Moreover, the forward masking was found to conceal more noise than the backward masking. The strongest masking effects were observed in the first frame after a scene cut, and in the last frame before a scene cut, in other words the neighboring frames around a scene cut can be degraded severely without affecting perceived image quality.
9. The human eye does not absorb the entire visual stimulus at the same resolution. That part of the stimulus which is imaged on the fovea has the highest resolution and regions which are imaged farther away have lower resolution.
10. Patch Redundancy. Natural images tend to contain repetitive visual content. In particular, small (e.g., 5 × 5) image patches in a natural image tend to redundantly recur many times inside the image, within the same scale.
11. The color we assign an object depends not only on the particular spectrum of light reflection from it but also on the light reflected from surrounding objects.
12. Temporal Silencing. This phenomenon is triggered by the presence of large temporal image flows – objects changing in hue, luminance, size, or shape appear to stop changing, for details i attach the paper “Motion Silences Awareness of Visual Change”, by Jordan W. Suchow and George A. Alvarez, Department of Psychology, Harvard University, 2011
13. Brightness is not always proportionate to the intensity of light entering the eye. The perceived brightness of an object not only depends on brightness intensity of the object, but also depends on its surrounding background:
The left patch appears brighter than the right one due to dark surrounding
14. Critical Fusion Frequency (CFF)
Critical Fusion Frequency is the rate of frames at which we perceive continuity between frames. For laptops CFF of 60fps suffices, for the cinema 24fps suffices.
Field of View (FoV) of human eyes covers 200◦ in width and 135◦ in height. Visual acuity is not evenly distributed in FoV, the photoreceptors and ganglion cells distributed extremely dense at the center – retinal fovea, whose radius is about 1.5 𝑚𝑚, the retinal fovea covers about 1% of the retina, the fovea becomes the most sensitive visual area. The densities of the photoreceptors and ganglion cells decrease rapidly from the fovea to the peripheral, consequently visual acuity progressively decreases as the distance to the fovea increases.
The pupil diameter also impacts on visual acuity, it varies from 3 𝑚𝑚 at day time to 9 𝑚𝑚 at night vision, the visual acuity decreases from day to night.
HVS senses the light with wavelengths between 380 𝑛𝑚 and 800 𝑛𝑚. There are three kinds of cones (S,M,L-cone), which cone’s type is sensitive to a specific range of wavelengths: S to blue (the maximum at 437 𝑛𝑚), M – green (maximum at 533 𝑛𝑚) and L to red (564 𝑛𝑚) lights, respectively. The three types of cones explains why the RGB representation is useful.
17. JND (Just Noticeable Distortion)
Due to visual sensitivity and masking effects in HVS, not every distortion is perceivable. The minimum visibility threshold of pixel intensity change is denoted as JND, or in other words: JND refers to the maximum distortion that HVS cannot perceive.
JND depends on many factors like average brightness, contrast, colorfulness, temporal activity etc. For example, the HVS sensitivity to error is generally higher in smooth regions and lower in the texture (high-detailed) regions.
Most of the photo-receptors on the retina in the human eye are located in a small circular region called the fovea which is located on the visual axis. The scene projected onto the fovea (the center of our gaze) is therefore be perceived in high resolution. Fovea only covers an area of about 2-5 degrees of our visual field.
Visual attention is a complex cognitive process, therefore it’s challenging to model it.
Human faces tend to attract visual attention as well as moving objects (this is an evolutionary acquired feature).
23+ years’ programming and theoretical experience in the computer science fields such as video compression, media streaming and artificial intelligence (co-author of several papers and patents).
the author is looking for new job, my resume
I am glad to be one of several visitants on this outstanding web site (:, regards for posting.
My spouse and I stumbled over here from a different web page and thought I might as well check things out. I like what I see so now i’m following you. Look forward to looking at your web page yet again.
A lot of whatever you articulate happens to be supprisingly accurate and it makes me ponder the reason why I hadn’t looked at this in this light before. This piece truly did turn the light on for me as far as this specific subject goes. Nevertheless there is actually just one issue I am not necessarily too comfy with so whilst I try to reconcile that with the main idea of your point, permit me see exactly what all the rest of your readers have to say.Well done.
There is visibly a bundle to realize about this. I assume you made various nice points in features also.
obviously like your web site but you need to check the spelling on quite a few of your posts. Several of them are rife with spelling issues and I to find it very bothersome to inform the truth however I?¦ll definitely come back again.
Co-admin Lee Prangell should have corrected spelling, this site is not commercial. i keep it on my own money and time.
Hello there! I know this is kind of off topic but I was wondering which blog platform are you using for this website? I’m getting fed up of WordPress because I’ve had issues with hackers and I’m looking at options for another platform. I would be great if you could point me in the direction of a good platform.
you are welcome ask our web-designer Ivan Lukin vanyalukin007@gmail.com
As a Newbie, I am permanently searching online for articles that can help me. Thank you
of course like your website however you need to take a look at the spelling on several of your posts. Several of them are rife with spelling issues and I in finding it very troublesome to inform the reality however I’ll definitely come again again.
Thanks , I have just been searching for information about this topic for ages and yours is the greatest I’ve discovered till now. But, what about the bottom line? Are you sure about the source?
I’m really impressed with your writing skills as well as with the layout on your blog. Is this a paid theme or did you customize it yourself? Anyway keep up the excellent quality writing, it’s rare to see a great blog like this one nowadays..
Definitely consider that which you said. Your favourite justification seemed to be at the web the simplest factor to bear in mind of. I say to you, I certainly get annoyed whilst people consider issues that they just don’t realize about. You controlled to hit the nail upon the highest as smartly as outlined out the whole thing without having side-effects , people can take a signal. Will probably be again to get more. Thank you
I really pleased to find this internet site on bing, just what I was searching for : D also saved to favorites.
When I originally commented I clicked the -Notify me when new comments are added- checkbox and now each time a comment is added I get four emails with the same comment. Is there any way you can remove me from that service? Thanks!
i have no idea how to do it. Pls. ask our Web-designer Ivan Lukin: vanyalukin007@gmail.com
To the videonerd.website owner, Your posts are always well-balanced and objective.
To the videonerd.website admin, Your posts are always well organized and easy to understand.