3 Plots In R



One of R’s key strength is what is offers as a free platform for exploratory data analysis; indeed, this is one of the things which attracted me to the language as a freelance consultant. Given the attraction of using charts and graphics to explain your findings to others, we’re going to provide a basic demonstration of how to plot categorical data in R.

Generic X-Y Plotting. Generic function for plotting of R objects. For more details about the graphical parameter arguments, see par. For simple scatter plots, plot.default will be used. Drawing Multiple Variables Using Base R. The following code shows how to draw a plot showing.

Introducing the Scenario

R can create almost any plot imaginable and as with most things in R if you don’t know where to start, try Google. The Introduction to R curriculum summarizes some of the most used plots, but cannot begin to expose people to the breadth of plot options that exist. Plot3D, from Karline Soetaert, is an R package containing many functions for 2D and 3D plotting: scatter3D, points3D, lines3D, text3D, ribbon3d, hist3D, etc. In addition to the x, y (and z) values, an additional data dimension can be represented by a color variable (argument colvar).

Imagine we are looking at some customer complaint data. This consists of a log of phone calls (we can refer to them by number) and a reason code that summarizes why they called us. Assume we have several reason codes:

  • A – Bad Product
  • B – Shipping
  • C – Packaging
  • D – Other

Now that we’ve defined our defect codes, we can set up a data frame with the last couple of months of complaints.

So, now that we’ve got a lovely set of complaints, lets do some analysis.

How To Plot Categorical Data in R

A good starting point for plotting categorical data is to summarize the values of a particular variable into groups and plot their frequency. We’re going to do that here. The one liner below does a couple of things.

  • use table () to summarize the frequency of complaints by product
  • Sort the table in decreasing order
  • Use barplot to generate a basic plot of the distribution

Yielding the following chart

If we produced the products in similar quantities, we might want to check into what is going on with our paper tissue manufacturing lines.

How To Plot Categorical Data in R – Proportions

Another common ask is to look at the overlap between two factors. You can accomplish this through plotting each factor level separately. We’re going to use the plot function below.

Which produces:

Other Common Tables and Charts for Categorical Data

These two charts represent two of the more popular graphs for categorical data. The
bar graph of categorical data is a staple of visualizations for categorical data. The spineplot heat-map allows you to look at interactions between different factors. These are not the only things you can plot using R.

Combine

You can easily generate a pie chart for categorical data in r. Look at the pie function.

Along the same lines, if your dependent variable is continuous, you can also look at using boxplot categorical data views (example of how to do side by side boxplots here).

That concludes our introduction to how To Plot Categorical Data in R. As you can see, there are number of tools here which can help you explore your data…

Going Deeper…

Interested in Learning More About Categorical Data Analysis in R? Check Out

Graphics

Tutorials

  • How To Run A Chi Square Test in R (earlier article)

Contents

We look at some of the ways R can display informationgraphically. This is a basic introduction to some of the basicplotting commands. It is assumed that you know how to enter data orread data files which is covered in the first chapter, and it isassumed that you are familiar with the different data types.

In each of the topics that follow it is assumed that two differentdata sets, w1.dat and trees91.csvhave been read and defined using the same variables as in the firstchapter. Both of these data sets come from the study discussed on theweb site given in the first chapter. We assume that they are readusing “read.csv” into variables w1 and tree:

A strip chart is the most basic type of plot available. It plots thedata in order along a line with each data point represented as abox. Here we provide examples using the w1 data frame mentioned atthe top of this page, and the one column of the data is w1$vals.

To create a strip chart of this data use the stripchart command:

As you can see this is about as bare bones as you can get. There is notitle nor axes labels. It only shows how the data looks if you were toput it all along one line and mark out a box at each point. If youwould prefer to see which points are repeated you can specify thatrepeated points be stacked:

A variation on this is to have the boxes moved up and down so thatthere is more separation between them:

If you do not want the boxes plotting in the horizontal direction youcan plot them in the vertical direction:

Since you should always annotate your plots there are many differentways to add titles and labels. One way is within the stripchartcommand itself:

If you have a plot already and want to add a title, you can use thetitle command:

Note that this simply adds the title and labels and will write overthe top of any titles or labels you already have.

A histogram is very common plot. It plots the frequencies that dataappears within certain ranges. Here we provide examples using the w1data frame mentioned at the top of this page, and the one column ofdata is w1$vals.

To plot a histogram of the data use the “hist” command:

As you can see R will automatically calculate the intervals touse. There are many options to determine how to break up theintervals. Here we look at just one way, varying the domain size andnumber of breaks. If you would like to know more about the otheroptions check out the help page:

You can specify the number of breaks to use using the breaksoption. Here we look at the histogram for various numbers of breaks:

You can also vary the size of the domain using the xlim option. Thisoption takes a vector with two entries in it, the left value and theright value:

The options for adding titles and labels are exactly the same as forstrip charts. You should always annotate your plots and there are manydifferent ways to add titles and labels. One way is within the histcommand itself:

If you have a plot already and want to change or add a title, you can use the title command:

Note that this simply adds the title and labels and will write overthe top of any titles or labels you already have.

It is not uncommon to add other kinds of plots to a histogram. Forexample, one of the options to the stripchart command is to add it toa plot that has already been drawn. For example, you might want tohave a histogram with the strip chart drawn across the top. Theaddition of the strip chart might give you a better idea of thedensity of the data:

A boxplot provides a graphical view of the median, quartiles, maximum,and minimum of a data set. Here we provide examples using twodifferent data sets. The first is the w1 data frame mentioned at thetop of this page, and the one column of data is w1$vals. The secondis the tree data frame from the trees91.csvdata file which is also mentioned at the top of the page.

We first use the w1 data set and look at the boxplot of this data set:

Again, this is a very plain graph, and the title and labels can bespecified in exactly the same way as in the stripchart and histcommands:

Note that the default orientation is to plot the boxplotvertically. Because of this we used the ylab option to specify theaxis label. There are a large number of options for this command. Tosee more of the options see the help page:

As an example you can specify that the boxplot be plotted horizontallyby specifying the horizontal option:

The option to plot the box plot horizontally can be put to good use todisplay a box plot on the same image as a histogram. You need tospecify the add option, specify where to put the box plot using the atoption, and turn off the addition of axes using the axes option:

If you are feeling really crazy you can take a histogram and add a box plot and a strip chart:

Some people shell out good money to have this much fun.

Plots

For the second part on boxplots we will look at the second data frame,“tree,” which comes from the trees91.csv file.To reiterate the discussion at the top of this page and the discussionin the data types chapter, we need to specify which columns arefactors:

We can look at the boxplot of just the data for the stem biomass:

That plot does not tell the whole story. It is for all of the trees,but the trees were grown in different kinds of environments. Theboxplot command can be used to plot a separate box plot for eachlevel. In this case the data is held in “tree$STBM,” and the differentlevels are stored as factors in “tree$C.” The command to createdifferent boxplots is the following:

Plots

Note that for the level called “2” there are four outliers which areplotted as little circles. There are many options to annotate yourplot including different labels for each level. Please use thehelp(boxplot) command for more information.

A scatter plot provides a graphical view of the relationship betweentwo sets of numbers. Here we provide examples using the tree dataframe from the trees91.csvdata file which is mentioned at the top of the page. In particular welook at the relationship between the stem biomass (“tree$STBM”) andthe leaf biomass (“tree$LFBM”).

The command to plot each pair of points as an x-coordinate and a y-coorindate is “plot:”

It appears that there is a strong positive association between thebiomass in the stems of a tree and the leaves of the tree. It appearsto be a linear relationship. In fact, the corelation between these twosets of observations is quite high:

Getting back to the plot, you should always annotate your graphs. Thetitle and labels can be specified in exactly the same way as with theother plotting commands:

3 Plots In R P

The final type of plot that we look at is the normal quantileplot. This plot is used to determine if your data is close to beingnormally distributed. You cannot be sure that the data is normallydistributed, but you can rule out if it is not normallydistributed. Here we provide examples using the w1 data framementioned at the top of this page, and the one column of data isw1$vals.

The command to generate a normal quantile plot is qqnorm. You can giveit one argument, the univariate data set of interest:

You can annotate the plot in exactly the same way as all of the otherplotting commands given here:

3 Plots In R H

After you creat the normal quantile plot you can also add thetheoretical line that the data should fall on if they were normallydistributed:

3 Dimensional Plots In R

In this example you should see that the data is not quite normallydistributed. There are a few outliers, and it does not match up at thetails of the distribution.