# How to calculate the five number summary (correctly)?

**Note:**This explains the so-called

*Method 2*which includes the median in the calculation of the quartiles. This method is used by R.

*fivenum*of the statistical package

*R*.

## A note on notation

Instead of giving concrete numerical examples, we will assume that the sorted data are
$$ x_1,\, x_2,\, \ldots,\, x_n $$

We also assume that the data has been sorted **in ascending order**. The sample size will always be denoted by \(n\).

## The basics

The five number summary of a data sample consists of 5 numbers:- The minimum
- The first quartile (\( Q_1 \))
- The median (M)
- The third quartile (\( Q_3 \))
- The minimum

## The "floor" function

If*x*is a real number then \(\floor{x}\) is the largest whole number not greater than \(x\). The floor function is available in Microsoft Excel and in most programming languages and software packages. When entering this function from the keyboard, one typically types

floor(x)

### Examples

- \( \floor{5}=5 \)
- \( \floor{3.14}=3 \)
- \( \floor{-1}=-1 \)
- \( \floor{-1.3}=-2 \)

## Sorting of the data

The first step is ordering the data from the smallest to the largest.## Determining the positions of the five numbers

Let*n*be the samples size. The positions in the sorted dataset

**are computed based on**. The positions

*n*only**may be fractional**, that is, they may fall

**between**the real data positions, which are 1, 2, ...,

*n*(whole numbers). Here are the positions, in the order of difficulty of understanding:

- The position of the minimum is 1;
- The position of the maximum is \(n\);
- The position of the median is \(\frac{n+1}{2}\) and it is fractional if \(n\) is
**even**; - The position of \( Q_1 \) is \(n_4 = \frac{\floor{\frac{n+3}{2}}}{2}\);
- The position of \( Q_3 \) is \( n+1-n_4 \).

## How to calculate the five numbers from their position?

The general rule is simple:- If a position is a whole number, the corresponding number equals to data at that position;
- If a position is a fractional number, the corresponding number is obtained by averaging the data at the nearest two whole number positions.

## An interpretation of \(n_4\)

Let us elaborate on the meaning of the formula
$$ n_4 = \frac{\floor{\frac{n+3}{2}}}{2} $$

It is easy to see that this formula is equivalent to:
$$ n_4 = \frac{\floor{\frac{n+1}{2}}+1}{2} $$

The number \(m=\floor{\frac{n+1}{2}}\) is the right-most
position **to the left of the position of the median**(the position of the median

**may be that position**; this is why the method explained here is

*Method 2*). Thus,

$$ n_4 = \frac{m+1}{2} $$

That is, it is the position of the median of all data to the left of the median (sometimes including the
median itself). This gives an alternative method to compute the position of \(Q_1\):
## Summary of the calculation of \(Q_1\)

- Compute the position of the median first; this is \(\frac{n+1}{2}\);
- Round it down to the nearest integer; this is \(m\) defined above; this is the right-most position to the left of the position of the median (this may coincide with the position of the median);
- Compute the position of the median of data \(x_1,\,x_2,\,\ldots,\,x_m\); this is \(n_4=\frac{m+1}{2}\).
- Compute \(Q_1\); if \(n_4\) is a whole number, \(Q_1 = x_{n_4}\);
if \(n_4\) is fractional \(Q_1 = \frac{1}{2}(x_{n_4'}+x_{n_4''})\) where
\(n_4'\) and \(n_4''\) are
**the whole numbers**nearest to \(n_4\);

### Examples

#### \(n=50\)

We compute*n*(the position of the first quartile) first:

_{4}
$$ n_4 = \frac{\floor{\frac{50+3}{2}}}{2} = \frac{\floor{26.5}}{2} = \frac{26}{2} = 13. $$

Number | Position |

Minimum | 1 |

Maximum | 50 |

Median | \( \frac{50+1}{2}=25.5\) |

\( Q_1 \) | 13 |

\( Q_3 \) | \( 50+1-13=38 \) |

$$ 1,\, 13,\, 25.5,\, 38,\, 50 $$

The five numbers are:
$$ x_1,\, x_{13},\, \frac{x_{25}+x_{26}}{2},\, x_{38},\, x_{50} $$

#### \( n=65 \)

We compute \( n_4 \) first:
$$ n_4 = \frac{\floor{\frac{65+3}{2}}}{2} = \frac{\floor{34}}{2} = \frac{34}{2} = 17. $$

Number | Position |

Minimum | 1 |

Maximum | 65 |

Median | \( \frac{65+1}{2}=33\) |

\( Q_1 \) | 17 |

\( Q_3 \) | \( 65+1-17=49 \) |

$$ 1,\, 17,\, 33,\, 49,\, 65 $$

The five numbers are:
$$ x_1,\,x_{17},\,x_{33},\,x_{49},\,x_{65} $$

#### n=19

We compute*n*first:

_{4}
$$ n_4 = \frac{\floor{\frac{19+3}{2}}}{2} = \frac{\floor{11}}{2} = \frac{11}{2} = 5.5. $$

Number | Position |

Minimum | 1 |

Maximum | 19 |

Median | \( \frac{19+1}{2}=10 \) |

\( Q_1 \) | 5.5 |

\( Q_3 \) | \( 19+1-5.5=14.5 \) |

$$ 1,\, 5.5,\, 10,\, 14.5,\, 65 $$

The five numbers are:
$$ x_1,\,\frac{x_5+x_6}{2},\, x_{10},\, \frac{x_{14}+x_{15}}{2},\, x_{19} $$

## For the curious - the R language code of the *fivenum*

The following R session reveals the code of *fivenum*. The above explanation concerns the

**3 lines of R code**in the curly braces after the word "else":

> fivenum function (x, na.rm = TRUE) { xna <- is.na(x) if (na.rm) x <- x[!xna] else if (any(xna)) return(rep.int(NA, 5)) x <- sort(x) n <- length(x) if (n == 0) rep.int(NA, 5) else {

n4 <- floor((n + 3)/2)/2 d <- c(1, n4, (n + 1)/2, n + 1 - n4, n) 0.5 * (x[floor(d)] + x[ceiling(d)])

} } <environment: namespace:stats> >