Sometimes we just need a simple plot of a few data points. When there is scant data a histogram or box plot just is not informative. This is a great use for a one dimensional scatter plot, dot plot, or a what is called a strip chart in R.
The basic idea is to see where the data lines along a line. For example, let say we have 20 times to first failure. A table of numbers is not all that helpful. We could explore using a cumulative distribution plot (Weibull analysis), yet it would be difficult to fit a distribution with so little data.
Let’s turn to a strip chart to get a look at the data.
We’ve just launched a product and now have reports of the first 20 field failures. Here’s the data in days.
8 56 3 15 60 4 13 10 1 5 11 1 1 19 1 6 12 12 6 9
A summary of the data provides:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.75 8.50 12.65 12.25 60.00
We enjoyed at least one failure the first day and one system survived 60 days. Yet there is more to learn here. A scan through the data shows 3 failures occurred on day 1.
Plotting the Data
To create a strip chart is rather straight forward. Just draw a line for the x-axis and mark off tick marks that cover the range of the data. For the sample data above, a line that spans from 0 to 60 is fine.
Then mark (I’m using R and the marks for each data point is a small square) the location of each data point along the line.
In R I loaded the data into a vector named x and ran the command stripchart(x) to create this plot.
Now, we do know by inspection of the dataset, there are 3 points at day one, so this plot using default settings is hiding some of the data. Let’s try a plot that jitters the marks when they overlap.
stripchart(x,method=”jitter”,jitter=0.1) produces the following plot.
Another method is to stack the data points that repeat. This makes it clear how many failures occur if more than one occurs the same day.
stripchart(x,method=”stack”,offset=0.5) produces the following plot.
Hum, there seem to be four failures on day one, which I missed scanning the dataset.
All of these plots do show more failures earlier and just a few occurring later. This may indicate a declining hazard rate over time.
Compare to Histogram or Box Plot
For most any dataset I would have first attempted a histogram, so let’s see what a basic histogram would reveal. Here I’m using the command hist(x) in R.
We could draw the declining hazard rate conclusion, yet the plot doesn’t have as much information as in the strip charts above.
How about a box plot, what would it reveal? Still in R and using boxplot(x) creates the following plot.
Here we do see most occur early, yet we again lose some of the information provided in the strip chart.
A strip chart is handy when taking a look at a small set of one-dimensional data. It shows all the data, and since there aren’t too many points cluttering the plot, it is informative.
Histograms and box plots are not quite as useful in a small dataset situation. So, my recommendation is to turn to a strip chart when first plotting a small data set.