This document provides some suggestions and resources for environmental engineers using R. This information is based on my own personal experience of learning and using R.
First, you will need the core R program which can be downloaded from the Comprehensive R Archive Network (CRAN). CRAN is the official repository of the R program as well as the various packages you will be using.
Second, download and install R Studio Desktop, which runs on top of the core R program and provides many useful features. While you could use the R GUI program that comes with R, R Studio provides a much richer programming environmental that makes using R much more productive and enjoyable.
If you're brand new to R (and even programming in gneeral), you'll need to learn the basics about how R works, the types of variables (e.g. numeric, character, boolean, ...), the programming structures (e.g. conditional if/else statements, for loops, while loops, ...).
The best starting point is probably one of the many books on general R programming. My recommendation is The Art of R Programming by Norman Matloff. There appears to be a free version available here, although I'm not sure if its exactly the same as the book.
Another great starting point are the many free and online classes through Coursera. In particular, the Data Science Specialization is a sequence of courses that will effectively make you at least an intermediate, if not advanced, R programmer. These are all free and can be
The real power of R is the ecosystem of packages that have been created by countless developers. Packages are
If you need to install a package, you're best bet is to try and use the standard install.packages()
function. Simple pass the name of the package as a string.
> install.packages('ggplot2')
R comes with a plotting functions (e.g. plot(x, y)) that are a good place to start. However, the ggplot2
package is a very popular alternative that has little in common with the basic R plotting functions. I would strongly recommend learning ggplot2 sooner than later. I personally only use ggplot2 and very rarely use the basic plotting functions. Not only do ggplot2 graphics look much better, but they provide a difference kind of language for creating plots that is extremely powerful.
The best way to learn ggplot2 is to start with the book by the author of the package, Hadley Wickham.
Once you learned the basics of R and are ready to start analyzing some data, I suggest reading the following two papers by Hadley Wickham.
The tidy data paper focuses on ways of storing data. For example, one could have a dataset containing daily streamflows from a variety of stations. One way to store this data would be a so-called wide format where the first column has the date, and the remaining columns are the flows with one column for each station:
library(lubridate)
q <- data.frame(Date=c("2012-01-01", "2012-01-02", "2012-01-03"),
StationA=c(123.3, 125.2, 128.6),
StationB=c(13.2, 14.1, 16.6),
StationC=c(1423.2, 1434.9, 1501.3))
print(q)
## Date StationA StationB StationC
## 1 2012-01-01 123.3 13.2 1423
## 2 2012-01-02 125.2 14.1 1435
## 3 2012-01-03 128.6 16.6 1501
However, you could also store this in a long format where each row represents a single flow value. This would require adding a column the indicates the station.
library(reshape2)
q.long <- melt(q, id=c('Date'), measure=c('StationA', 'StationB', 'StationC'),
value.name='Flow', variable.name='Station')
print(q.long)
## Date Station Flow
## 1 2012-01-01 StationA 123.3
## 2 2012-01-02 StationA 125.2
## 3 2012-01-03 StationA 128.6
## 4 2012-01-01 StationB 13.2
## 5 2012-01-02 StationB 14.1
## 6 2012-01-03 StationB 16.6
## 7 2012-01-01 StationC 1423.2
## 8 2012-01-02 StationC 1434.9
## 9 2012-01-03 StationC 1501.3
If you don't know how to do something or cannot get something to work correctly, chances are the solution is out there on the Internet already. I frequently search for problems on
If you can't find the solution, then you can certainly post a question to Stack Overflow or the R Mailing list. But before you do, read How To Ask Questions The Smart Way by Eric Steven Raymond. This will help you communicate your problem most effectively so that others can more easily help you figure it out.