$\def\floor#1{\left\lfloor #1\right\rfloor}$

# How to calculate the five number summary (correctly)?

The following summary yields a method compatible with the function fivenum of the statistical package R.

## A note on notation

Instead of giving concrete numerical examples, we will assume that the sorted data are
$$x_1,\, x_2,\, \ldots,\, x_n$$
We also assume that the data has been sorted in ascending order. The sample size will always be denoted by $$n$$.

## The basics

The five number summary of a data sample consists of 5 numbers:
• The minimum
• The first quartile ($$Q_1$$)
• The median (M)
• The third quartile ($$Q_3$$)
• The minimum

## The "floor" function

If x is a real number then $$\floor{x}$$ is the largest whole number not greater than $$x$$. The floor function is available in Microsoft Excel and in most programming languages and software packages. When entering this function from the keyboard, one typically types
floor(x)
instead of $$\floor{x}$$.

### Examples

• $$\floor{5}=5$$
• $$\floor{3.14}=3$$
• $$\floor{-1}=-1$$
• $$\floor{-1.3}=-2$$

## Sorting of the data

The first step is ordering the data from the smallest to the largest.

## Determining the positions of the five numbers

Let n be the samples size. The positions in the sorted dataset are computed based on n only. The positions may be fractional, that is, they may fall between the real data positions, which are 1, 2, ..., n (whole numbers). Here are the positions, in the order of difficulty of understanding:
• The position of the minimum is 1;
• The position of the maximum is $$n$$;
• The position of the median is $$\frac{n+1}{2}$$ and it is fractional if $$n$$ is even;
• The position of $$Q_1$$ is $$n_4 = \frac{\floor{\frac{n+3}{2}}}{2}$$;
• The position of $$Q_3$$ is $$n+1-n_4$$.

## How to calculate the five numbers from their position?

The general rule is simple:
• If a position is a whole number, the corresponding number equals to data at that position;
• If a position is a fractional number, the corresponding number is obtained by averaging the data at the nearest two whole number positions.

## An interpretation of $$n_4$$

Let us elaborate on the meaning of the formula
$$n_4 = \frac{\floor{\frac{n+3}{2}}}{2}$$
It is easy to see that this formula is equivalent to:
$$n_4 = \frac{\floor{\frac{n+1}{2}}+1}{2}$$
The number $$m=\floor{\frac{n+1}{2}}$$ is the right-most position to the left of the position of the median (the position of the median may be that position). Thus,
$$n_4 = \frac{m+1}{2}$$
That is, it is the position of the median of all data to the left of the median (sometimes including the median itself). This gives an alternative method to compute the position of $$Q_1$$:

## Summary of the calculation of $$Q_1$$

1. Compute the position of the median first; this is $$\frac{n+1}{2}$$;
2. Round it down to the nearest integer; this is $$m$$ defined above; this is the right-most position to the left of the position of the median (this may coincide with the position of the median);
3. Compute the position of the median of data $$x_1,\,x_2,\,\ldots,\,x_m$$; this is $$n_4=\frac{m+1}{2}$$.
4. Compute $$Q_1$$; if $$n_4$$ is a whole number, $$Q_1 = x_{n_4}$$; if $$n_4$$ is fractional $$Q_1 = \frac{1}{2}(x_{n_4'}+x_{n_4''})$$ where $$n_4'$$ and $$n_4''$$ are the whole numbers nearest to $$n_4$$;

### Examples

#### $$n=50$$

We compute n4 (the position of the first quartile) first:
$$n_4 = \frac{\floor{\frac{50+3}{2}}}{2} = \frac{\floor{26.5}}{2} = \frac{26}{2} = 13.$$
 Number Position Minimum 1 Maximum 50 Median $$\frac{50+1}{2}=25.5$$ $$Q_1$$ 13 $$Q_3$$ $$50+1-13=38$$
Thus, the positions of the five numbers are (in increasing order):
$$1,\, 13,\, 25.5,\, 38,\, 50$$
The five numbers are:
$$x_1,\, x_{13},\, \frac{x_{25}+x_{26}}{2},\, x_{38},\, x_{50}$$

#### $$n=65$$

We compute $$n_4$$ first:
$$n_4 = \frac{\floor{\frac{65+3}{2}}}{2} = \frac{\floor{34}}{2} = \frac{34}{2} = 17.$$
 Number Position Minimum 1 Maximum 65 Median $$\frac{65+1}{2}=33$$ $$Q_1$$ 17 $$Q_3$$ $$65+1-17=49$$
Thus, the positions of the five numbers are (in increasing order):
$$1,\, 17,\, 33,\, 49,\, 65$$
The five numbers are:
$$x_1,\,x_{17},\,x_{33},\,x_{49},\,x_{65}$$

#### n=19

We compute n4 first:
$$n_4 = \frac{\floor{\frac{19+3}{2}}}{2} = \frac{\floor{11}}{2} = \frac{11}{2} = 5.5.$$
 Number Position Minimum 1 Maximum 19 Median $$\frac{19+1}{2}=10$$ $$Q_1$$ 5.5 $$Q_3$$ $$19+1-5.5=14.5$$
Thus, the positions of the five numbers are (in increasing order):
$$1,\, 5.5,\, 10,\, 14.5,\, 65$$
The five numbers are:
$$x_1,\,\frac{x_5+x_6}{2},\, x_{10},\, \frac{x_{14}+x_{15}}{2},\, x_{19}$$

## For the curious - the R language code of the fivenum

The following R session reveals the code of fivenum. The above explanation concerns the 3 lines of R code in the curly braces after the word "else":
> fivenum
function (x, na.rm = TRUE)
{
xna <- is.na(x)
if (na.rm)
x <- x[!xna]
else if (any(xna))
return(rep.int(NA, 5))
x <- sort(x)
n <- length(x)
if (n == 0)
rep.int(NA, 5)
else {

n4 <- floor((n + 3)/2)/2
d <- c(1, n4, (n + 1)/2, n + 1 - n4, n)
0.5 * (x[floor(d)] + x[ceiling(d)])

}
}

>

Besides the 3 lines, the code primarily has to do with removing missing data.