Being in a State of Flow(charting)

Today we look at how to draw simple flowcharts in R.

I think I am not far off when I say that flowcharts are an essential tool in the engineering toolbox. They provide a visual way to describe a set of activities, or a a process if you will. This can range from listing sequential steps in a manufacturing process to laying out a project plan to describing a decision making process (think decision trees).

“If you can’t describe what you are doing as a process, you don’t know what you’re doing.”
– W. Edwards Deming

It comes as no surprise that engineers love to use flowcharts to describe or document stuff. If you’re like me, you’ve probably used various Microsoft Office applications to draw flowcharts. I, for example, have mastered creating flowcharts in PowerPoint over the years. Some prefer Visio or maybe some other application (Figma, Miro and the like).

It should also not come as a surprise that it is possible to make great-looking flowcharts in R as there are various packages out there that facilitate it, diagrammeR being one of them. What I have found is that a lot of times, especially in the case of more complex flowcharts, it is easier to just make one using your favorite desktop application, and that’s what I’ve done myself over the years. It’s still worth taking a look at diagrammeR as it is highly customizable and has a ton of fuinctionality.

As for simple flowcharts: I recently stumbled upon a fairly new package called ggflowchart developed by Nicola Rennie. ggflowchart is a great package that allows you create simple flowcharts with ease. Let’s go through a couple examples!

To get started, let’s install ggflowchart and call the libraries that we will be using. Click here for the full script.

# EDITION 008: FLOWCHARTS

# 0. INSTALL PACKAGE GGFLOWCHART AND LOAD LIBRARIES ----

install.packages("ggflowchart")

library(tidyverse)
library(sherlock)
library(ggflowchart)

Using the package is really easy: the ggflowchart() function take a dataframe with two columns. One is for “from” node names, the other is for “to” node names (in other words, this is how you specify the directionality of the flow). Let’s look at a simple example.

# 1. GGFLOWCHART() FUNCTION ----

# 1.1 A SIMPLE EXAMPLE ----
simple_flow <- tibble(from = c("Step 1", "Step 2"),
                      to   = c("Step 2", "Step 3"))

ggflowchart(data = simple_flow)

We’ve simply created a flowchart consisting of three steps, Step 1, Step 2 and Step 3, and specified the flow. This is what the end result looks like. Pretty sweet.

OK, let’s look at a somewhat more complex example, which will be a specific kind of decision tree called a search tree* (or diagnostic tree) used in the diagnosis of product quality or reliability-related problems. The specific search tree we are going to build is for the process characterization example discussed in previous editions, 3) A Pivotal Moment, 4) Small Multiples, Huge Advantage and 5) Small Multiples for Characterization.

Let’s see what the code looks like. Note that for this decision tree, we did some customization such as text size, text color, arrow color and such.

# 1.2 DECISION TREE EXAMPLE ----
search_tree <- tibble(from = c(rep("What drives variation in bond strength?", times = 2), 
                               "Higher family group", "Higher family group", 
                               "Higher family group ", "Higher family group "), 
                      to   = c("Elemental variation", "Higher family group", 
                               "Cyclical variation", "Higher family group ", 
                               "Structural variation", "Temporal variation"))


ggflowchart(data = search_tree, text_size = 3.5, 
            x_nudge = 0.35, text_colour = "grey30", 
            arrow_colour = "grey50", color = "grey30")

OK, time to see what the object looks like:

Not too shabby for a quick and dirty function call, right?

A custom function to draw a simple flowchart

To save some time, one could create a custom function to speed up creating simple flowcharts where there is only one stream (no decision points, parallel flows or diversions). An example of this could be a simple manufacturing process.

I went ahead and wrote a cimple simple function called draw_flowchart() that accomplishes what I describe above. The function simply takes a vector of the steps (in order of appearence) as well as another vector for additional information about each of the steps. You can also add a title as well as specify the width and height of the boxes for the steps.

# 2 FUNCTION TO CREATE SIMPLE FLOWCHARTS ----

draw_flowchart <- function(steps, category = NULL, horizontal = TRUE, chart_title = "", 
                           x_nudge = 0.1, y_nudge = 0.25) {
    
    process_steps <- steps
    
    processs_steps_length <- length(process_steps)
    
    from_column <- process_steps[1:processs_steps_length-1]
    to_column   <- process_steps[2:processs_steps_length]
    
    process_flow_tbl <- tibble(from = from_column,
                               to   = to_column)
    
    if (!is.null(category)) {
        nodes <- tibble(name = process_steps, 
                        type = category %>% as_factor())  
    }
    
    # plotting ----
    if (is.null(category)) {
        flowchart <- ggflowchart(process_flow_tbl, horizontal = horizontal) +
            scale_fill_sherlock()  
        
    } else {
        flowchart <- ggflowchart(process_flow_tbl, node_data = nodes, fill = type, horizontal = horizontal, 
                                 x_nudge = x_nudge, y_nudge = y_nudge) +
            scale_fill_sherlock()  
    }
    
    flowchart <- flowchart +
        labs(title = chart_title) +
        theme(plot.title   = element_text(color = "grey20"), 
              legend.title = element_blank(), 
              legend.text  = element_text(color = "grey20")
        )
    
    return(flowchart) 
    
}

We can now use this simple function to create a flowchart for a gravity die casting process. Let’s see what this looks like. The vector called casting_process_stepslists the process steps in order of appearance in the final flow chart, and the vector called location contains additional information about the location of each of the steps (in terms of which building of the plant each step takes place).

# 3. TEST FUNCTION ----

# 3.1 STEPS OF A GRAVITY DIE CASTING PROCESS ----
casting_process_steps <- c("Core making", "Casting", "Shakeout", 
                           "Riser saw-off", "Cleaning", "Heat treatment", 
                           "Shot blasting", "Machining", "Final inspection")
# 3.2 METADATA (LOCATION WITHIN MANUFACTURING PLANT) ----
location <- c("Building 1", rep("Building 2", times = 6), rep("Building 3", times = 2))


# 3.3 DRAW_FLOWCHART() FUNCTION ----

# 3.3.1 No location information ----
draw_flowchart(steps = casting_process_steps, horizontal = FALSE, 
               chart_title = "Gravity Die Casting Process Steps")

# 3.3.2 WIth location information ----
draw_flowchart(steps = casting_process_steps, category = location, horizontal = FALSE, 
               chart_title = "Gravity Die Casting Process Steps")

When we run the function not specifying location in the category argument, this is what we get:

Here’s what we get when we specify location in the category argument:

This looks even better as it displays additional information about each of the steps (which building they take place). It is easy to see how coloring the steps based on a categorical variable can provide more information about the process.

I hope you enjoyed this week’s edition!

Resources for this week’s edition:

ggflowchart package documentation – special thanks to Nicola Rennie for having built it
GitHub repo
sherlock package

Resources for learning R:

R for Data Science: a very thorough reference book by Hadley Wickham, the creator of the tidyverse. Absolutely free of charge and full of relevant examples and practice tests.
ggplot2 reference book: a super detailed online book on the gpplot2 plotting package.
My favorite R course, Business Science DS4B101-R: I learned R mainly throgh this course. Highly recommended if you want to get up to speed and beyond in a relatively short time. It has everything one will need from data cleaning to data visualization to modeling. This course is especially useful for engineers trying to learn or get good at R as it heavily focuses on the fundamentals but goes way beyond just that. Note: this is an affiliate link, meaning you get a hefty discount if you purchase a course, and I receive a small commission.

* Source: Diagnosing Performance and Reliability, David J. Hartshorne and The New Science of Fixing Things, 2019

About Gabor Szabo

Leave a Reply Cancel reply