## Mann-Whitney U Test

A non parametric test (statistical test that assumes the data are not normally distributed) that is used to test the difference between two data sets. It ranks the data set and uses the median value to compare them. As a result of the test we can assess if the difference between two datasets is real or a result of fluke. It is the equivalent of t test, applied for independent samples. Few things to consider while deciding if this test is appropriate:

- Investigating the difference between two dataset.
- Data must be non-parametric (data that is not assumed to be normally distributed) and ordinal (data that can be ranked).
- Although its a non-parametric test, both of the dataset must have similar distribution.
- The number of data must be more than 5 and it is recommended to be less than 20 as well.

Remember, the test is also known as **Mann–Whitney–Wilcoxon** (**MWW**) or **Wilcoxon rank-sum test.**

**MWW in R.**

** **Lets compare the points scored per season by Kobe Bryant and Michael Jordan to see if they are any different.

I downloaded the Points per season for both the players from http://www.databasebasketball.com/players

I only intent to compare the first 15 seasons as Jordan only played for 15 seasons and Kobe is in his 16th.

Here we start with null hypothesis that there is no significant difference between the two samples.

Here is the dataset that i used:

Bryant Jordan

539 2313

1220 408

996 3041

1485 2868

1938 2633

2019 2753

2461 2580

1557 2404

1819 2541

2832 457

2430 2491

2323 2431

2201 2357

1970 1375

2078 1640

I saved this as a tab delimited text file and then loaded this dataset in R using:

>BryantJordan<-read.delim(“score_test.txt”)

>Bryant<-BryantJordan$Bryant

> Jordan<-BryantJordan$Jordan> wilcox.test(Bryant,Jordan,correct=FALSE)

Wilcoxon rank sum test

data: Bryant and Jordan

W = 69, p-value = 0.0742

alternative hypothesis: true location shift is not equal to 0

The p-value is greater than 0.05, which means that we can accept thenull hypothesis that there is no significant difference between two sata sets.

If I had run wilcox.test(b, a, correct = FALSE), the p-value would have remained logically the same.

>wilcox.test(Jordan,Bryant,correct=TRUE)

Wilcoxon rank sum test

data: Jordan and Bryant

W = 156, p-value = 0.0742

alternative hypothesis: true location shift is not equal to 0

**SOURCE**

- http://www.slideshare.net/mhsgeography/mann-whitney-u-test-2880296
- http://faculty.vassar.edu/lowry/utest.html
- http://www.r-bloggers.com/wilcoxon-mann-whitney-rank-sum-test-or-test-u/