So by now you may have noticed I’m focused on the basics of how R represents numbers and vectors. That is the general point of this tutorial…not to show you how to type cor.test(…) and get a number out, but rather how to manipulate data and data structures to work for you. In Computer Science, one thing you’re taught early on is: the more sophisticated a data structure, the more simplified the code will (generally) be. R is no exception…except for one big exception which I’ll get to later on. For now, on to the second dimension.
To do this, we’re going to read in a file rather than make it as we’ve done in the past. First, you’re going to have to change your working directory. If you’re running the R GUI console, you can do this in the menubar under the Misc->Change Working Directory… command. Or, if you are like me and your idea of a GUI is a VT220, you can use the command prompt:
> getwd()  "/Users/shamma" > setwd("/Users/shamma/tmp") > getwd()  "/Users/shamma/tmp" >
Change your working directory into a new directory somewhere and make this simple little file and call it sample.csv:
CA,CB,CC 11,12,13 21,22,23 31,32,33 41,42,43
Ok, so if your working directory is set right, we should be able to read the file in easily.
> s <- read.csv("sample.csv", header=TRUE, sep=",") > class(s)  "data.frame" > s CA CB CC 1 11 12 13 2 21 22 23 3 31 32 33 4 41 42 43
I’m specifying the comma delimiter, but it defaults to a comma already…so feel free to leave it out. We also told the read.csv function that this data has a header row. In the first column you see there then numbers 1 to 4 are just row numbers for your viewing pleasure. To get stuff out of this “data.frame” (and we’ll worry about what that is later), R does something called “column-major order” like most scientific languages. Meaning, complex structures are collections of columns. However, we access it row then column like so:
> s[1, 2]  12
Gets us row 1 column 2. Nice ya? Lets look at some other examples. I’m going to put comments after each command to explain what’s happing inline of the code.
> s # col 1, don't do this because it looks odd. CA 1 11 2 21 3 31 4 41 > s[1, ] # row 1 (note the blank where the col is to be given) CA CB CC 1 11 12 13 > s[ ,2] # col 2 (same trick as before with the blank)  12 22 32 42 > s[3, -2] # row 3, No col 2 CA CC 3 31 33 > s[-2, -2] # no 2nd row or col CA CC 1 11 13 3 31 33 4 41 43
Pretty simple and follows what we learned last time. Hey, remember that header row? We can access columns by name. This is pretty handy and will keep you from counting which column that was in your dataset.
> s$CA  11 21 31 41 > s$CA[2:4]  21 31 41 > s$CB[2:4]  22 32 42
Great! We know how to read something in and how to pick out exactly what we need. Let’s do something real. First, make this file and call it cities.csv.
name,long,lat Newcastle,-1.6917,55.0375 Austin,-97.7,30.3 Cairo,31.25,30.05
Next, read it in like so:
> cities <- read.csv("cities.csv", header=TRUE) > plot(cities$long, cities$lat, pch=20, col="blue", cex=.9)
And you should see a very useless plot window like:
Great…so we need a map to make this, well, intelligible. To do this, we’ll need our first package. You can install these little puppies from the menubar somewhere under Packages & Data. I prefer to use the keyboard (mice carry diseases). However you want to do it, get the maps package.
> install.packages("maps") Installing package(s) into ‘/Users/shamma/Library/R/2.13/library’ (as ‘lib’ is unspecified) trying URL 'http://cran.cnr.berkeley.edu/bin/macosx/leopard/contrib/2.13/maps_2.2-5.tgz' Content type 'application/x-gzip' length 2104264 bytes (2.0 Mb) opened URL ================================================== downloaded 2.0 Mb The downloaded packages are in /var/folders/un/unv7BK-CG2qWofD8jLjha+++Q3I/-Tmp-//RtmpkxL1RA/downloaded_packages >
Great! Now we are going to plot it again, but this time put the points on a world map as long and lat. First we load the package. Display the map plot with map(…). Then add the points (remember from the first tutorial, the call points(…) adds dots to an existing plot).
> library(maps) > map(database="world", col="grey") > points(cities$long, cities$lat, pch=20, col="blue")
You should get something like:
Better? Finally, lets color the regions. This is where we start to dive into package magic. We can ask the package, where are these places. Then fill the map regions. Then plot our points. Notice how we are using variable and column names to make our code human readable.
> cities name long lat 1 Newcastle -1.6917 55.0375 2 Austin -97.7000 30.3000 3 Cairo 31.2500 30.0500 > places <- map.where(x=cities$long, y=cities$lat) > places  "UK:Great Britain" "USA" "Egypt" > map(database="world", col="grey") > map(database="world", col="grey", regions=places, fill=TRUE, add=TRUE) > points(cities$long, cities$lat, pch=20, col="blue")
And bingo! The map package found the three countries. The first call to map(…) [line 9] displays the world map. The second call to map(…) [line 10] adds our regions to the plot (see the add=TRUE parameter) by filling in the countries. Then points(…) adds our three city dots via their long and lat. Our first kinda real thing and we have a nice geo plot! This is how I did the geo plots in the Statler Inauguration demo. In the next installment, we’ll enter the zoo and I’ll explain why you’ll love and hate the data.frame object.