皮爾遜相關係數明顯大於斯皮爾曼等級相關係數的解釋是什麼?


2

獲得皮爾遜相關係數值比斯皮爾曼等級相關係數值(同一個值)大得多(約2倍)的解釋是什麼?數據)?

這與將Spearman等級相關係數(即排名數據的 Pearson相關係數)視為的概括不矛盾。> Pearson的單調依賴評估,而不是線性依賴?單調相關性的相關係數值如何小於線性相關性的相關係數值?

我很驚訝地看到在包含$ N $〜100個元素的數據集中這是可能的。我應該補充一點,與 Pearson的相關係數相關的p值為0.0,而 Spearman的排名的p值為〜0.10。

可能的解釋:

此行為可能是由數據集的極值驅動的。我比較了Pearson抄送的價值。($ \ rho $)和Spearman的排名抄送($ \ rho_r $)刪除之後。我介紹了兩面的p值。

  • 完整數據集:$ \ rho $ = 0.381(p值:0.000),$ \ rho_r $ = 0.151(p值:0.131)

  • 已移除一個異常值:$ \ rho $ = 0.336(p值:0.001),$ \ rho_r $ = 0.125(p值:0.213)

  • 已刪除三個異常值:$ \ rho $ = 0.167(p值:0.100),$ \ rho_r $ = 0.076(p值:0.459)

剩餘分佈(繪製)似乎不受異常值的影響,但仍表現出相同的行為。完整數據可在2020_0之後使用;請注意,異常值對應於前三行。

Distribution after the three outliers are removed (the first three rows in the attached file)

7

This is a simple dataset, where the points come alternating from two linear functions: the raw data

The pearson correlation detects, there is a general upwards motion in the combined data (red an black together) and is r=.453 The spearman correlation just sees the ranks, which are distributed like this: the ranks of the above data

There is a high and a low rank alternating, so no clear trend for spearman. Spearman r = .079 This pearson is 5.7 times as high and you can easily increase that value by extending the row. You can even easily get a negative Spearman for a positive Pearson by just leaving out the last value. So there is nothing in the way of a compbination of a large Pearson and a small Spearman r and the above picture is even a bit similar to your's.

You can easily see how I constructed the data by looking at them:

1, -.01, 2, -.02, 3, -.03, 4, -.04, 5, -.05, 6, -.06, 7, -.07, 8, -.08, 9, -.09, 10

Hope that helps, Bernhard