Video Compression

VideoNerd

Use cases from real problems in simplistic form

 

You are going to verify a SW product by random tests, i.e. you activate a random number generator producing pseudo-random numbers (“pseudo” is not a typo, a deterministic machine can’t produce pure random numbers) say in the range [1..100].

You pick the first 10 random numbers and configure the SW product accordingly (each number corresponding to a predefined preset). However, the random generator produces numbers with replacement, i.e. some numbers may appear several times in a sequence. In such case you may twice or thrice or even more check your SW product in the same mode. It’s redundant.

To avoid duplication of generated pseudo-random numbers you have to keep already obtained random numbers and if the current number has been generated then you activate the random generator again.

 

1) What’s probability that the list of 10 random numbers from the range [1..100] contain at least one repetition (or duplication)?

This problem is similar to Birthday’s Paradox.

Let’s compute the probability P0 that all random numbers are different (the order in the sequence matters): 

                      P0 = (100)10 /10010 = (1 − 1/100)(1 − 2/100) …. (1 9/100)                      (1)

We simplify the above formula by neglecting cross products (because 10 << 100 ).

                       P0 ≈  1 10·9/200 = 0.55                                                                            (1′)

 

This is very rough approximation, there is better approximation:

What’s is the probability to pick a sample of the length ‘n’ of random numbers (repetitions allowed) from the range [1..N] such that no duplication is present in the sample.

                    P0 ≈  exp(-n*(n-1)/(2*N))           (2)

For our case n=10 and N=100 we obtain:

                         P0 ≈  exp(-10*9*/200) = 0.63  (versus 0.55 according the formula (1′))

Thus, the probability for at least one repetition to occur among 10 random numbers from the range [1..100] is ≈ 0.36 .

In such case it’s worth keep already obtained random numbers and to remove duplication (in order to avoid redundant testing of your SW product).

You can generate random numbers by a SW Random Number Generator, however pls. bear in mind these numbers are not really random (they even are called ‘pseudo-random’). Indeed, a deterministic machine can’t in virtue generate real random numbers!!!

If you wish real random figures then you can take two different real dices (not virtual) and roll them, the random number is composed from two digits (first digit is the result of the first dice and the second digit is the result of the second dice respectively). Thus, each roll gives you random numbers in the range [11..66], totally 55 different numbers.

In spirit of the formula (1′) we derive that the probability of a sequence 10 mutually different random numbers obtained by rolling two dices :

  P0 ≈  1 10·9/110 =  0.18

Hence, the probability of a repetition of random numbers obtaining by rolling of two dices 10 times is   0.82

Perhaps, rolling of dices is slightly old-fashioned way to fetch real random numbers, i suggest fetching of real random data from www.random.org

RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs.

The following python-script fetches random number from 1..100  from random.org

import urllib2

# get random number from 1 to 100
n=20

url = “https://www.random.org/strings/?num=1&len=” + str(n) + \
“&digits=on&upperalpha=on&loweralpha=on&unique=on&format=” \
“plain&rnd=new”

response = urllib2.urlopen(url)
raw = response.read().decode(‘utf8’)  # a raw sequence
val = sum(map(ord,raw))%100+1

print(“random value from 1 to 100:  %d” %val)

***************************************************************************************************************

 

 

2) We generate random numbers in the range [1..100] until a duplication appears. what’s the probability that the process lasts for more than 10 iterations?

The answer: ≈ 1 P0 , where  P0 is taken from the formula (1′)0.45

***************************************************************************************************************

 

 

3) You generate 100 numbers in the range [1..200]. What’s the probability that the number 13 is absent in the 100-element list of random numbers?

We can use Poisson approximation (n=100 is large and p=1/200 is small).  Therefore the answer is about e-np= e-0.5 0.6

What’s the probability that the number 13 is present exactly one time in the list:  1/2*e-0.5 0.3

***************************************************************************************************************

 

 

4) A SW program runs 100 times and 8 times crashed. What’s the range of the real failure rate with 95% confidence? 

Generally speaking for ‘n’ runs with ‘x’ observed crashes the real failure proportion ‘p’ with 95% confidence lies in the range:

In our case 8 failures from 100 runs the real failure rate with confidence 95% lies in the range [0.026 …  0.13]

 

8/100 ± 1.96*sqrt(8/100*(1-8/100)/100)

***************************************************************************************************************

5) QA team tested a program several hundred times and got 25% of tests crashed.

A programmer modified the program to decrease the failure rate and delivered the updated program to QA team. QA  re-tested the program 24 times only and observed only 3 crashes.

Does the modification decrease the crash rate (or alternatively increase the success rate)?

Let’s formulate the null hypothesis:  the success rate is unchanged after the modification and is 75%.

What’s the probability of getting 3 failures or less of 24 tests given the success rate 75%?

The number of successes is binomially distributed  Binom(0.75,24).  The probability of getting 3 failures or less of 24 tests is expressed in the following formula:

To compute the above formula i recommend to use python module scipy.stats :

from scipy.stats import binom

n = 24
p = 0.75
binom.pmf(21,n,p)+binom.pmf(22,n,p)+binom.pmf(23,n,p)+binom.pmf(24,n,p)

the answer:  0.115

If the significance level of rejection of the null hypothesis (p=0.75) is 0.05, then we can’t reject the null hypothesis that the modification indeed increases the success rate.

 

6) Your QA engineer said that he ran a program 100 times and 50 times the program was crashed. Assuming that the probability of the program to crash is 50% (hence to succeed is 50% too). This problem is equivalent to finding the probability of tossing a fair coin 100 times and getting 50 heads.

 

Python code:

import math

p1=float(math.comb(100, 50))/2**100

print(p1)

0.07980868844676221

 

 

 

 

 

21 Responses

  1. I am in fact pleased to glance at this webpage posts which carries lots of valuable data, thanks for providing these
    kinds of data.

  2. Wonderful site. Plenty of helpful information here. I?¦m sending it to a few pals ans also sharing in delicious. And obviously, thanks on your effort!

  3. Greetings from California! I’m bored to tears at work so
    I decided to browse your site on my iphone
    during lunch break. I love the information you provide here and can’t wait to take a look when I get home.

    I’m amazed at how fast your blog loaded on my mobile .. I’m not even using
    WIFI, just 3G .. Anyways, great blog!

  4. I believe what you posted was very reasonable. But, what about this?
    suppose you were to create a killer headline?
    I ain’t saying your information is not good, but suppose you
    added a post title that grabbed folk’s attention? I mean Probability Theory and SW Verification – VideoNerd is kinda vanilla.
    You ought to peek at Yahoo’s home page and note how they create news
    headlines to get viewers interested. You might add a related
    video or a picture or two to grab people interested about what you’ve written. Just my
    opinion, it would make your posts a little bit more interesting.

    1. it’s not my interest to grab people’s attention. This site is tailored for very narrow ring of specialists and beginners in video compression and media streaming.

  5. Hi there! This post couldn’t be written any better! Reading through this post reminds me of my previous room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Thank you for sharing!

  6. you are really a good webmaster. The site loading speed is incredible. It seems that you’re doing any unique trick. In addition, The contents are masterpiece. you’ve done a great job on this topic!

  7. of course like your web-site but you have to take a look at the spelling on quite a few of your posts. Several of them are rife with spelling problems and I find it very troublesome to tell the truth however I will surely come again again.

  8. Hello there, I found your website via Google while searching for a related topic, your web site came up, it looks great. I’ve bookmarked it in my google bookmarks.

Leave a Reply

Your email address will not be published. Required fields are marked *