A histogram is a graphical representation of a set of data. It is useful to visually inspect data for its range, distribution, location, scale, skewness, etc. There are many uses for histogram, there you should know how to create one.
Let’s explore a set of data and create default histograms using a variety of methods. If you have a way to create a histogram using some other method or software package please send it over and we’ll add it to the article.
The Data
This is just completely made up data set.
5, 7, 3, 4, 3, 6, 9, 2, 4, 3, 6, 9, 1, 3, 4, 7, 4, 5, 4, 3
The values range from a low of 1 and a high of 9. All integers.
A Manually Created Histogram
Draw and label the x and y axis of the chart. For the x-axis a span from zero to ten will encompass all the values in the dataset. For the y-axis, we can include integers starting at zero and we could go up to 20, given that is the number of values in the dataset, yet not all values are the same, so let’s start with zero to ten.
The x-axis is our values (test scores, plant heights rounded to centimeters, whatever the dataset represents). The y-axis is the count of values within the specific bin.
Determine the bin size is a bit of a flexible process. In part it depends on what you want to learn about your data. If we want to know how many or each integer, then each bin is one integer. If the data included more significant digits we could specify the bin as a range. For example, for the above dataset, we could use bins of 0 to 1, > 1 to 2, > 2 to 3, etc. I’m using greater than signs to indicate if the value is just above 1 in the number line(1.0001, for example), we would count that value in the bin that ranges from >1 to 2. A value of 1 exactly would belong to the bin ranging from zero to 1.
Note bin sizes do not need to be equal, yet it helps interpret a histogram if they are all the same size. One fancy way to determine bin sizes is known as Sturge’s Rule and may provide a good bin size when dealing with larger dataset or as a starting point for exploring your data.
Let’s use 5 bins: 0 – 1, >1 – 3, >3 – 5, >5 – 7, >7 – 9.
With our bins established, we simply count the number of values that fall within each bin. We can create a frequency table to layout the count of values within each bin.
Bin | Count |
---|---|
1 | 1 |
3 | 6 |
5 | 7 |
7 | 4 |
9 | 2 |
Between zero and one, there is only one value of 1, thus the bin’s count is one. There are a count of six values that are a 2 or 3, and so on.
Now draw a rectangle over the bin range up to the count.
Here’s my rough hand-drawn histogram.
Maybe I should have selected even numbers for the bins to keep the bin widths the same. Maybe you can create one starting with the bin zero to two and send over a photo of it.
Excel for Mac (version 16 – current as of May 2021)
Added the data to a column in Excel and selected the dataset values. On the ribbon, click the Insert tab, then click the Statistical chart icon (Statistical icon) and under Histogram, select the Histogram.
And, Excel selected three bins and created the plot.
The bins are 2.8 units wide and range from 1 up to 9.4. Of course, you edit the titles, labels, and even the bin sizes
Google Sheets with XLMiner Analysis ToolPak (May 2021)
Wondering how Sheets would deal with this graphic, found that I needed to install an add-on to have histogram functionality. That didn’t take long, and added the dataset in a column, selected the data and opened the add-on, then found the Histogram function.
It asked for the range of the data, a range for the bin value (I used the same as my manual approach above), and where to locate the frequency table. It also created the graph.
A bit nicer then my hand drawn one, yet contains the same information.
R software (version 4.0.4)
The command hist has many options and ways to alter the final graphic, yet here let’s just use all defaults. The commend in R is:
hist(c(5, 7, 3, 4, 3, 6, 9, 2, 4, 3, 6, 9, 1, 3, 4, 7, 4, 5, 4, 3))
and the output is
It is a bit odd, as it seems to set the first bin to anything 2 or below, as there is a count of two for the 1’s and 2’s in the dataset. Then the bins shift to single integers, or >2 to 3, then >3 to 4, etc.
Mathematica (Mac version 12.2)
The built in command Histogram has a vast array of options to craft and format the histogram. Here again just using all defaults, expect I did add axis labels. The command I used is:
Histogram[{5, 7, 3, 4, 3, 6, 9, 2, 4, 3, 6, 9, 1, 3, 4, 7, 4, 5, 4,3}, AxesLabel -> {"Values", "Count"}]
and the output then appears as
Cool, the package choose the bins in groups of two, from 0 to 2, >2 to 4, etc.
Summary
Here are five ways to create a histogram of the same dataset. Note how different each resulting histogram appears based on how the bins were assigned. When using histograms, I often alter the bins a few times to explore the dataset. Of course, with ample data each of the above methods would likely create a very similar looking histogram.
Finally, if you have Minitab, Numbers, Item Software, Matlab, or one of the many other software packages out there, please use the dataset above and create a histogram. Send me a brief description and the resulting graphic, include the software name and version, brief description of how to create a histogram, and the resulting image of the histogram. I’ll add them to the article above and with your permission include attribution to you for your contribution.
Leave a Reply