Pearson correlation measures a linear relationship between X and Y, while Spearman correlation estimates a non-linear relationship. In some cases, the relationship between X and Y (e.g.  Bitrate and Quality) is strictly non-linear:

 

taken from https://avichawla.substack.com/p/the-limitation-of-pearson-correlation

 

The following Python script shows how to compute Pearson and Spearman correlation coefficients:

 

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

# generate distorted linear data of Y = 2X
X=np.array(range(0,40))
Y = [ 2*x for x in X]
D = 4*np.random.randn(1,40)
Yd = np.array(Y + D)  # add nois
Yd=Yd[0]

fig, ax = plt.subplots()
plt.grid()
plt.title(‘Y = 2X+noise’)
ax.set_ylabel(‘Y’,fontdict=dict(weight=’bold’))
ax.set_xlabel(‘X’,fontdict=dict(weight=’bold’))
ax.scatter(X, Yd)

 

# Pearson correlation coefficient

scipy.stats.pearsonr(X, Yd)[0]

0.9858533253709376

# Spearman correlation coefficient

scipy.stats.spearmanr(X, Yd)[0]

0.9838649155722327

# generate distorted linear data of Y = X^2
Y1 = [ x*x for x in X]
D1 = 32*np.random.randn(1,40)
Yd1 = np.array(Y1 + D1)
Yd1=Yd1[0]

fig, ax = plt.subplots()
plt.grid()
plt.title('Y = X^2+noise')
ax.set_ylabel('Y',fontdict=dict(weight='bold'))
ax.set_xlabel('X',fontdict=dict(weight='bold'))
ax.scatter(X, Yd1)



scipy.stats.pearsonr(X, Yd1)[0]
0.9627985894542643

scipy.stats.spearmanr(X, Yd1)[0]
0.9904315196998127

 

 


Appendix A: Examples of Pearson Correlation ‘r’

Appendix B: Linear Regression Equation

For input train {xi} data and output {yi} the coefficients of linear equation:  y = c0 + c1 * x  are determined as follows:

 

Python script to compute c0 and c1:

import numpy as np
# generate distorted linear data from Y = 2X+3
X=np.array(range(0,40))
Y = [ 2*x+3 for x in X]
D = 4*np.random.randn(1,40)
Yd = np.array(Y + D)
Yd=Yd[0]
# compute the means of ‘X’ and ‘Yd’ arrays 
xm=np.mean(X,axis=0)
ym=np.mean(Yd,axis=0)
c1=sum((X – xm) * (Yd – ym))/ sum((X – xm)*(X – xm))
c0=ym-c1*xm
print(c1)
print(c0)
1.9718933743372042
3.091355148859236

 

Leave a Reply

Your email address will not be published. Required fields are marked *