Home > Statistics > Mann-Whitney U Test

Mann-Whitney U Test

A non parametric test (statistical test that assumes the data are not normally distributed) that is used to test the difference between two data sets. It ranks the data set and uses the median value to compare them. As a result of the test we can assess if the difference between two datasets is real or a result of fluke. It is the equivalent of t test, applied for independent samples. Few things to consider while deciding if this test is appropriate:

  • Investigating the difference between two dataset.
  • Data must be non-parametric (data that is not assumed to be normally distributed) and ordinal (data that can be ranked).
  • Although its a non-parametric test, both of the dataset must have similar distribution.
  • The number of data must be more than 5 and it is recommended to be less than 20 as well.

Remember, the test is also known as  Mann–Whitney–Wilcoxon (MWW) or Wilcoxon rank-sum test.

MWW in R.

 Lets compare the points scored per season by Kobe Bryant and Michael Jordan to see if they are any different.

I downloaded the Points per season for both the players from http://www.databasebasketball.com/players

I only intent to compare the first 15 seasons as Jordan only played for 15 seasons and Kobe is in his 16th.

Here we start with null hypothesis that there is no significant difference between the two samples.

Here is the dataset that i used:

Bryant Jordan
539 2313
1220 408
996 3041
1485 2868
1938 2633
2019 2753
2461 2580
1557 2404
1819 2541
2832 457
2430 2491
2323 2431
2201 2357
1970 1375
2078 1640

I saved this as a tab delimited text file and then loaded this dataset in R using:

>BryantJordan<-read.delim(“score_test.txt”)

>Bryant<-BryantJordan$Bryant
> Jordan<-BryantJordan$Jordan

> wilcox.test(Bryant,Jordan,correct=FALSE)

Wilcoxon rank sum test

data: Bryant and Jordan
W = 69, p-value = 0.0742
alternative hypothesis: true location shift is not equal to 0

The p-value is greater than 0.05, which means that we can accept thenull hypothesis that there is no significant difference between two sata sets.

If I had run wilcox.test(b, a, correct = FALSE), the p-value would have remained logically the same.

>wilcox.test(Jordan,Bryant,correct=TRUE)

Wilcoxon rank sum test

data: Jordan and Bryant
W = 156, p-value = 0.0742
alternative hypothesis: true location shift is not equal to 0

SOURCE

Advertisements
Categories: Statistics
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: