As part of a class in Landscape Genetics, faculty (mostly done by Melanie Murphy and Jeffrey Evans) compiled an extensive list of spatial data sources. These were made available on the course website we hosted but I wanted to make a more persistent copy of them here so they will not be lost. They are listed below the break.
OK, so I just ‘found‘ shiny and it has a lot of cool stuff to it. OK, I’ve known about it for a long time but have just had the opportunity to sit down and work it out and see how it can fit into the presentation and learning I’m trying to develop in my Applied Population Genetics online textbook. Here is a brief overview of how I set up the shiny server on my Ubuntu box that is hosting the book (so I can embed more interactivity in the display).
A very cool writeup on making blow out maps.
Here are some very useful cheat sheets put out by RStudio. A great resource of information!
I just uploaded a new plugin for RStudio called dlab. I’ll be migrating over all the little helper functions I use to this as a general require() on startup. What it has now is an AddIn that allows you to select text and have it wrapped in the r-code markup. I’m moving stuff between ePub and Markdown and it was needed.
You can install it as:
then look at the AddIns menu for wrapCode.
The program STRUCTURE is an ubiquitous feature of many population genetic studies these days—if it is appropriate is another question. Today, while covering model based clustering in population genetics, we ran into a problem where STRUCTURE was unable to run and the OS said it was Corrupted and should be thrown away. Jump below for our fix, it really is an easy one.
An analysis common to modern population genetics is that of finding ecological distances between objects on a landscape. The estimation of pairwise distance derived from spatial data is a computationally intensive thing, one that if you are not careful will bring your laptop to its knees! One way to mitigate this data problem is to use a minimal amount raster area so that the estimation of the underlying distance graph can be done on a smaller set of points. This example provides a simple solution using convex hulls. Jump below for the complete example.
It is often the case that the raster we are working with is not the exact size of the area from which our data are collected. It is a much easier situation if the raster is larger than the area than if you need to stitch together two raster Tiles to get all your data onto one extent. In my doctoral thesis work, the area of the southern Ozark mountains that my sites were in was not only straddling a boundary between existing rasters, it was also at the boundary of two UTM zones! What a pain.
A raster is essentially an image, whose pixel size correspond to a particular spatial extent and the data contained within each pixel represents a particular feature on the landscape. Common rasters are DEM’s (measuring elevation), rainfall, temperature, buildings, etc. In R, it is common to think of rasters as matrices whose values measure some feature on the landscape. In this section, we will examine how to acquire, load, manipulate, and extract data from raster objects.
I will be posting portions of all 10 chapters of my upcoming textbook, Applied Population Genetics, as early draft chapters to this website over the spring semester. Read more
Been working on a lexicographic analysis of ‘Sustainability’ as published by the journals PNAS and Sustainability. Here are the stemmed word forms for 366 published articles represented as a hierarchical clustering. The wordclouds represent the top 10 word stems per group.
Every time I upgrade in any significant way, two R libraries seem to raise their ugly heads and scream like a spoiled child— rgdal and rgeos . Why do these two have to be SOOOO much of a pain? Why can’t we have a auto build of a binary with all the options in it for OSX? Who knows? I always feel like I get the fuzzy end of the lollipop with these two. Here is my latest approach for getting them going.
In R, there is often the need to merge two
data.frame objects (say one with individual samples and the other with population coordinates. The
merge() function is a pretty awesome though it may take a little getting used to.
Here are some things to remember:
- You need to have two data.frame objects to merge
- The first one in the function call will be the one merged on-to the second one is added to the first.
- Each will need a column to use as an index—it is a column that will be used to match rows of data. If they are the same column names then the function will do it automagically, if no common names are found in the names() of either data.frame objects, you can specify the columns using the optional by.x= and by.y= function arguments.
Much of the work in my laboratory uses spatial data in some context. As such it is important to try to be able to grab and use spatial data to in an easy fashion. At present, R is probably the best way to grab, visualize, and analyze spatial data. For this example, I went to http://worldclim.org and downloaded the elevation (altitude) for tile 13 (eastern North America) as a GeoTiff. A GeoTiff is a specific type of image format that has spatial data contained within it. The tile data has a pixel resolution of 30 arc seconds which puts us in the general area of ~ 1km. First, we need to get things set up to work.
# Set the working directory to where you want it.
# load in the raster library
Loading required package: raster
Loading required package: sp
Here is a short (39 minute) video of some basic graphics approaches in R I use in a class on population genetics.
The default CRAN repository is not the only place that R packages are stored. You can also find them on github. When I develop libraries for R, I typically develop them on http://github.com/dyerlab and then upload them to CRAN when I get to major milestones. The latest versions of all my software will always be found on github. So here is how to install packages directly. Read more
I’ve been working on integrating the Swift language into my analysis workflow but much of what I do involves the GNU Scientific Libraries for matrix analysis and other tools. Here is a quick tutorial on how to install the GSL library on a clean OSX platform.
- It is easiest if you have XCode installed. You can get this from the App Store for free. Go download it and install it.
- Download the latest version of the GSL libraries. You can grab them by:
- Looking for your nearest mirror site listed at http://www.gnu.org/prep/ftp.html and connecting to it.
- Open the directory
gsl/where all the versions will be listed. Scroll down and grab
- Open the terminal (Utilities -> Terminal.app) and type:
- Unpack the archive by:
tar zxvf gsl-latest.tar.gzthen
cd gsl-1.16/(or whatever the version actually was, it will probably be some number larger than 1.16).
- Inside that folder will be a README file (which you probably won’t read) and an INSTALL file (which you should read). In that folder it will tell you to:
sudo make install. This last command will require you to type in your password as it is going to install something into the base system.
- All the libraries and header files will be installed into the
There seems to be some nefarious conspiracy against packaging spatial R packages on the mac platform. Don’t quite understand it but it sucks. Here is how to install the rgeos package.
If you try the normal way, you get the following error:
package ‘rgeos’ is available as a source package but not as a binary
Warning in install.packages : package ‘rgeos’ is not available (as a binary package for R version 3.1.3)
which is not very helpful. Read more
Working on some code and was having a tough time configuring the color palette in GGally since it does not produce a ggplot object. It appears to be a larger problem. So, here is one hack, redefine the ggplot function and change the default palette there. Need to make a dyerlab::palette now…
ggplot <- function(...) ggplot2::ggplot(...) + scale_color_brewer(palette="Set1")
ggpairs(df,columns = 3:7,axisLabels="none",color="color")