There are a number of ways in ggplot to express your data. The first are the geometric objects:
- geom_bar(): Bar charts
- geom_point(): Points – great for scatter plots
- geom_line(): Line charts
- geom_boxplot(): A box & whiskers plot
- geom_smooth(): Smoothed means with CI
- geom_histogram(): Histogram
- geom_density(): Smoothed density plot
- geom_qq(): A quantile-quantile plot
- geom_errorbar(): Error bars
All of these geoms (and more) are highly customizable. But first we must set up the data.
We’ll call it myData (I’m going to keep this fairly generic). After loading MyData into R, we declare our variable that we’re going plot: myVariable <- ggplot(myData, aes(myX, myY)) Where myX is our X variable and myY is our Y variable. aes is an abbreviation of aesthetic. If you are going to do something like a histogram or a density plot, you’ll only need an X variable. At this point, you can set other aesthetics, like color, that would apply to all layers of the graph. To add color depending on the value of myX, we could do the following: myVariable <- ggplot(myData, aes(myX, myY, color = myX)) If we wanted to add a title, we could do the following: myVariable <- ggplot(myData, aes(myX, myY, color = myX)) + opts(title = “My Title”) Let’s say we wanted a bar graph: myVariable + geom_bar() Additionally, we want to add a point plot: myVariable + geom_bar() + geom_point() We can layer plots on top of each other.
We can also customize each plot. For example, if we wanted to make the scatter plot red, we could change that by doing the following: myVariable + geom_bar() + geom_point(color = “Red”) We can change the type of line, shape, size, color, and transparency of most geoms by adding the right aesthetic to the geom. We can also change the plot, for example: myVariable <- ggplot(myData, aes(myX)) myVariable + geom_histogram(aes(y = …density…))
This would change a histogram from the default of counting of each instance to the density of each instance. We can also change the width of the histogram by adding bin width: myVariable + geom_histogram(binwidth = 0.4) A couple of other really cool plots are facet_wrap and facet_grid. Both of these will help avoid over plotting – putting too much data in each graph – by splitting the graphs into multiple instances. facet_wrap will wrap each graph by the variable given: myVariable <- ggplot(myData, aes(myX, myY)) myVariable + facet_wrap( ~ myY) We can also dictate how many columns (ncol) or rows (nrow) by adding that: myVariable + facet_wrap( ~ myY, ncol = 2) facet_grid works similarly: myVariable + facet_wrap(myX ~ myY).
Let’s say that we want to split the data based on a third variable in our data – myThirdVariable. That would work by adding the third variable in the data set up (myVariable) as well as in the geom we wanted to use it in: myVariable <- ggplot(myData, aes(myX, myY, color = myThirdVariable)) myVariable + geom_point + geom_smooth(method = “lm”, aes(fill = myThirdVariable)).
I threw in method = “lm”, which changes the geom_smooth to a straight line. This will split the data based on the value of myThirdVariable. We can use stat_summary() if we want to plot against one independent variable, like so: myVariable <- ggplot(myData, aes(myX, myY)) myVariable + stat_summary(fun.y = mean, geom = bar) This would create a bar chart of the mean of each myY for each value of myX. Here are six options for stat_summary():
- fun.y = mean: The mean
- fun.y = median: The median
- fun.data = mean_cl_normal(): 95% confidence intervalsassuming normality
- fun.data = mean_cl_boot(): 95% confidence intervals based on a bootstrap
- mean_sdl(): Sample mean and standard deviation
- fun.data = median_hilow(): Median and upper and lower quantiles
Again, if we want to separate the graphs by our third variable, we can do that by using facet_wrap: myVariable <- ggplot(myData, aes(myX, myY)) myVariable + stat_summary(fun.y = mean, geom = bar) + facet_wrap( ~ myThirdVariable) There is so much more that can be done with just these functions. You can explore all the variables and combinations by downloading ggplot2 and playing around with all the possibilities.