Small Multiples for Characterization

In the last edition of R for Engineering, we learned how to draw small multiple plots in R and harness the power of comparison. We went from a busy graph to being able to use ggplot’s faceting functions to create a small multiples plot. If you need a recap, here’s a link to the last edition.

That is to say, nature’s laws are causal; they reveal themselves by comparison and difference, and they operate at every multi-variate space-time point.
– Edward Tufte

Small multiples have many uses in engineering, but the one I personally use them the most for is characterization and diagnosis. In my line of work, which is quality engineering, the ability to diagnose problems in physical systems (both product and machine/process-related) is a critical skill, and I will go as far as to say that diagnosing problems is a critical skill in any engineering discipline.

When it comes to characterizing or diagnosing physical systems, I tend to follow a framework set up by a group of experts called The New Science of Fixing Things. I will not go into great detail on the nitty and gritty of their framework as this newsletter is not the necessarily right format to do so, but I will say that their framework is very powerful and relies heavily on using small multiples and applying what’s called the process of elimination (or progressive search). Again, there is a great deal more to the approach; I highly recommend that you read their wonderful and very detailed book Diagnosing Performance and Reliabilitywritten by David Hartshorne.

For characterizing a process, it’s useful to start with what they call Matryoshka Characterization*. Think of it as a nested characterization strategy where as a first step, you start with designing a study to observe the behavior of the characteristic of the product or process you’re interested in characterizing, and you do this in a bottom-up fashion. The study design has to be intentional for it to be useful and to provide insight into what’s happening* or information for subsequent characterization steps.

This bottom-up fashion I mention is based on capturing behavior by stratifiying the observations into gour groups, starting at the bottom:

Elemental variation* – within one machine cycle (this can be either within one piece or between pieces within one machine cycle)
Cyclical variation* – consecutive manchine cycles within the same manufacturing structure
Structural variation* – between parallel manufacturing processing structures (e.g. multiple lines)
Temporal variation* – between time periods

After you’ve carried out the study, small multiples type of plots – you guessed it – are just the graphical technique to use to graphically observe system behavior.

Going back to the bond strength example we’ve been discussing: the small multiples plot we created in the last edition was essentially a type of what’s called a multivari plot, and its main purpose is to answer the question “which group sees the most variation?”.

This time we are going to use the draw_multivari_plot() function from the sherlock package, which is a quick and easy way to draw a multivari plot (take a look at the code here).

# WEEK 005: SMALL MULTIPLES, HUGE ADVANTAGE

# 0. LOAD LIBRARIES ----
library(tidyverse)
library(sherlock)


# 1. READ IN DATA ----
bond_strength_wide <- load_file("https://raw.githubusercontent.com/gaboraszabo/datasets-for-sherlock/main/bond_strength_wide.csv", 
                                filetype = ".csv")

# 2. DATA TRANSFORMATION ----
bond_strength_long <- bond_strength_wide %>%
    mutate(Cycle = rep(1:3, times = 3) %>% as_factor()) %>% 
    pivot_longer(cols = 2:6, names_to = "Line", values_to = "Bond_Strength") %>% 
    mutate(Line = Line %>% str_remove("Line ")) %>% 
    arrange(Time, Line)


# 3. MULTIVARI PLOT ----

bond_strength_long %>% 
    draw_multivari_plot(response = Bond_Strength, 
                        factor_1 = Cycle, 
                        factor_2 = Line, 
                        factor_3 = Time)

A multivari plot of cycle by line by time.

If you look closely, you may notice that all four groups in the bottom-up hierarchy but one have been represented on the plot.

Cyclical is represented by the consecutive bonding cycles, structural is represented by the three manufacturing lines and temporal is represented by the three different time periods. The only group that is not represented is elemental as only one unit is bonded in one bonding cycle (there are ways to assess elemental variation by looking at energetic behavior within one unit, but I won’t go into detail on that today). This provides plenty of information to understand what’s happening*.

So, which group sees the most variation? It looks like most of the variation is already captured between two consecutive bonding cycles (look at line 3 and 5 at 2pm or line 4 and 5 at 5pm). This is an adhesive bonding operation where a specific amount of adhesive is applied on the two mating surfaces (an injection molded stopcock and a tubing component), and the bond then gets cured for a specific amount of time. The adhesive amount and the curing time are controlled.

The observed behavior (most of the variation can be seen in the cyclical group) provides a strong clue pointing not to the bonding process but to the inputs, i.e. the components, to the bonding process.

While this barely scratches the surface of this powerful framework for diagnosis and characterization, I hope it provided enough information for you to start digging into both the methodlogy itself as well as how to create these small multiples type of plots in R. You can also check out The New Science of Fixing Things blog site, which has information-rich articles on the details of the framework and its application.

The package sherlock, which I built last year, focuses on ready-made small multiples type of graphs, specifically for this type of characterization and diagnosis. I will discuss various small multiples type of graphs in future editions.

Hope you’re enjoying the newsletter so far. I would love to get feedback from you on what you’d like to see discussed in future editions. So, don’t hesitate to reach out!

Resources:

R for Data Science reference book
sherlock package
draw_multivari_plot() documentation
Reference book Diagnosing Performance and Reliability, David J. Hartshorne and The New Science of Fixing Things

* Source: Diagnosing Performance and Reliability, David J. Hartshorne and The New Science of Fixing Things, 2019

About Gabor Szabo

Leave a Reply Cancel reply