Using Google Drive as an R Data Repository

This is such a common thing to do these days, it is easier to just post this here rather than search through my class notes each time someone asks me how to do this.

Here is the issue.  Say you have some data associated with your research project and are adding to it and doing analyses.  Chances are, you have it shoved into an Excel spreadsheet that is on your laptop, your home computer, the computer in the lab, a backup disk (you are keeping backups, right?), and even perhaps shared on a Cloud Drive with your collaborators/advisors/partner/whatever.  Great!  Now you have absolutely no way to know which version of the dataset is the real one and which are wrong.

Publishing Spreadsheets from Google Drive

In R, we can use the ability to serve out spreadsheet-like data as *.csv files using Google Drive.  This way, the data are in one (and only one) location and can be accessed by anyone you would like to grant access.  Here is how to set it up.

First, on Google Drive, you need to tell it to make a spreadsheet available and how to publish it.  This is done from the menu as File -> Publish to the Web…  A dialog box will pop up, like the one below, and let you select which sheet is published and what it is published as.  The salient part here is that you should select Comma separated values (*.csv) as the output type.  The URL that is provided in the image below should be copied as we will be using it in R to grab the data.

Dialog from Google Docs allowing you to select the sheet to be published and the format it will be published in.

Next, you can fire up R (I use RStudio as a sane interface) and make sure you have the RCurl library installed.  If not, install it like this:

install.packages("RCurl")

So to load the file from Google Drive, we need to format the URL from Google Drive

require(RCurl)
link <- "https://docs.google.com/spreadsheets/d/1QL9fYeKkDKphba12WLVTBJrv_d1WHTc9SrZoBeIFgj8/pub?gid=0&single=true&output=csv"
url <- getURL( link )

Then open an internet connection asking for a text-based communication between Google Drive and your R session

con <- textConnection( url )

and then pull the data into R as if it was on the local filesystem.

data <- read.csv( con )

And your data should be there.

summary(data)

#   Population       SampleID      X.Coordinate   Y.Coordinate      Cf.G8      
# Min.   :2.000   Min.   :203.0   Min.   : 346   Min.   : 254   Min.   :147.0  
# 1st Qu.:3.000   1st Qu.:315.5   1st Qu.:1482   1st Qu.:2231   1st Qu.:155.0  
# Median :4.000   Median :428.0   Median :1656   Median :2928   Median :157.0  
# Mean   :3.809   Mean   :428.0   Mean   :1747   Mean   :2588   Mean   :160.3  
# 3rd Qu.:5.000   3rd Qu.:540.5   3rd Qu.:1914   3rd Qu.:3082   3rd Qu.:165.0  
# Max.   :6.000   Max.   :653.0   Max.   :3778   Max.   :6148   Max.   :199.0  
#                                                               NAs   :9      
#       X           Cf.H18           X.1            Cf.N5            X.2     
# Min.   :149   Min.   : 83.0   Min.   : 83.0   Min.   :148.0   Min.   :150  
# 1st Qu.:161   1st Qu.: 99.0   1st Qu.:107.0   1st Qu.:165.0   1st Qu.:170  
# Median :167   Median :105.0   Median :115.0   Median :170.0   Median :170  
# Mean   :172   Mean   :104.5   Mean   :112.8   Mean   :167.7   Mean   :170  
# 3rd Qu.:181   3rd Qu.:111.0   3rd Qu.:119.0   3rd Qu.:170.0   3rd Qu.:170  
# Max.   :519   Max.   :123.0   Max.   :123.0   Max.   :172.0   Max.   :172  
# NAs   :9     NAs   :1       NAs   :1       NAs   :36      NAs   :36   
#     Cf.N10           X.3            Cf.O5            X.4       
 Min.   :171.0   Min.   :175.0   Min.   :176.0   Min.   :176.0  
# 1st Qu.:187.0   1st Qu.:193.0   1st Qu.:178.0   1st Qu.:182.0  
 Median :189.0   Median :197.0   Median :182.0   Median :194.0  
# Mean   :189.4   Mean   :196.3   Mean   :182.5   Mean   :190.3  
 3rd Qu.:193.0   3rd Qu.:201.0   3rd Qu.:182.0   3rd Qu.:196.0  
# Max.   :205.0   Max.   :205.0   Max.   :202.0   Max.   :204.0  
 NAs   :13      NAs   :13      NAs   :8       NAs   :8

The Atom Editor

 
Being new to the Windows platform, I’m on the look for a good text editor that can do the myriad of tasks that we do each day.  Notepad is not an option, let’s be real.  I’m looking for something that can be extended and has been designed from the bottom up for wrangling text and writing code.  Ultimately, I would like something that is amenable to teaching both R and Python using a single interface.  RStudio is great for R but sucks for Python.  Juypter notebooks are clunky and toy-like.
 
Atom is created by the Github folks and is integrated into ‘the mothership’ repository.  Here is what I did to get it up and running and having Python running correctly.
 
Packages
 
Packages are extensions to the main editor that accomplish some function to make you life a bit easier.  Here are some of the ones I find helpful.   You can find packages and install them using Settings -> Install.  Then search for the packages and hit the install button.  
 
Minimap
 
If you have scripts and/or code that is longer than a single page (and who doesn’t) minimap provides a graphical depiction of your code on the right-hand side of the window to allow you to easily jump up and down the file.  Here is an example on an R script.
 
 
Script
 
Script is a package that runs code in the editor directly.  This means you can run individual lines as you develop and look at the output.  Very helpful.
 
Themes
 
There are a ton of themes, both overall for the editor as well as syntax highlighting, available.  To install one, select Install -> Theme and type a name from Settings.  Here is the atom-material-syntax being installed.
 
 
Once installed, you can change both the UI and the syntax colors.
 
 
 
As I expand more into using Atom, I’ll add additional posts showing how I have configured it for use in my daily coding activities.
 
 
 

Rice Center Presentations

Some awesome presentations by lab members at today’s research symposium.

Bonnie Roderique speaking about the James Spinymussel.

Matt DeSaix presenting work on migration connectivity in Prothonotary warblers.

Jane Remfert & Wyatt Carpenter discussing the Carver Community Project

Dr. Gary Machlis

Today, we had the first event in the Global Environment Speaker Series hosted by the VCU Center for Environmental Studies and the Departments of Geography & the Environment and Environmental Studies from the University of Richmond—a set of units that already has a lot of collaborations among faculty, curricula, and students.

We invited Dr. Gary Machlis to come and speak on his new book entitled, “Conservation in America: A char for rough waters

It was a great talk and the poster session.   We have to incorporate more inter-unit dialog and collaboration.  

Syndication Between WP Sites

Syndication is a process whereby you can post something to your site and other locations will detect that you have posted something and then pull in the content to their site, making it look like you wrote it and posted it on their site.  Is that clear?  Here is my use-case:

  1. I post everything I do to rodneydyer.com.
  2. For things that I want to be shown on my work page (https://dyerlab.org), I select a particular Category or Tag for the post.
  3. My work site monitors and any time something at rodneydyer.com comes up with the key Category and/or Tag, dyerlab.org pulls the content of the post in and formats it to look just like I wrote it for that site.

This is particularly interesting for teaching and other uses.  If a class uses WordPress for its webpage, students can provide content for that class page by publishing on their own site.  This allows each student to create a "Digital Portfolio" of work that they maintain (see my thing on Content Silos for more on this).

FeedWordPress Plugin

I'm going to use the FeedWordPress Plugin for this because it was the one that my university uses and I want to standardize the approaches.

To install it, go to Plugins->Add New and search for it.  Install & Activate  .I'm going to use a new Category, named Dyerlab, to trigger the syndication.  So I add a new one.

A new top-level category to handle the syndication.

Configuration

OK, now on my personal page, I have a category "Dyerlab" that I will attach to things that I want to show up on my Dyerlab WordPress site.  To make the connection, we need to get the category feed address.  Unless you are changing something drastic it has the following structure:

https://yoursiteurl/category/categoryname

which in my case is:

https://rodneydyer.com/category/Dyerlab/

You can try it out and you should see (if you have any posts with that category published) a list of just those posts.  If so, perfect.  If not, then you either have not posted anything with that category or you have not set up the category correctly.  Go back and check.

Now, I need to set up the other site, in this case my laboratory site, to monitor my personal site, and any time something is posted, grab it.  Go open your other site and make sure the plugin is installed.  This site will "Pull" the posts from the original site.  Click on Syndication in the bottom left panel and you will open the settings page.  

The syndication settings page.

In the "New Source" box, paste in the category address from your other site.  In my case I pasted in

https://rodneydyer.com/category/dyerlab.

You will be taken to a verification screen where you can verify that things are working properly and select the correct feed type.  There is a 'verify' link that you can use to make sure it is providing good input.  After you select which kind of feed you want, you will be redirected back to the list, as above, but with your new feed in it.

Success

Now, when I write something (like this post) on my site, it will automagically show up on my laboratory site as well. The Cool thing is that wherever it is displayed, it is reformatted to look as if it belonged at that location.  Here is this post on my personal site.

and on my laboratory site

are identical in content, though are individually styles.  Pretty cool!

Featured image bytes amattox mattox (CC BY-NC 2.0).

Syndication Between WP Sites

Syndication is a process whereby you can post something to your site and other locations will detect that you have posted something and then pull in the content to their site, making it look like you wrote it and posted it on their site.  Is that clear?  Here is my use-case:

  1. I post everything I do to rodneydyer.com.
  2. For things that I want to be shown on my work page (https://dyerlab.org), I select a particular Category or Tag for the post.
  3. My work site monitors and any time something at rodneydyer.com comes up with the key Category and/or Tag, dyerlab.org pulls the content of the post in and formats it to look just like I wrote it for that site.

This is particularly interesting for teaching and other uses.  If a class uses WordPress for its webpage, students can provide content for that class page by publishing on their own site.  This allows each student to create a "Digital Portfolio" of work that they maintain (see my thing on Content Silos for more on this).

FeedWordPress Plugin

I'm going to use the FeedWordPress Plugin for this because it was the one that my university uses and I want to standardize the approaches.

To install it, go to Plugins->Add New and search for it.  Install & Activate  .I'm going to use a new Category, named Dyerlab, to trigger the syndication.  So I add a new one.

A new top-level category to handle the syndication.

Configuration

OK, now on my personal page, I have a category "Dyerlab" that I will attach to things that I want to show up on my Dyerlab WordPress site.  To make the connection, we need to get the category feed address.  Unless you are changing something drastic it has the following structure:

https://yoursiteurl/category/categoryname

which in my case is:

https://rodneydyer.com/category/Dyerlab/

You can try it out and you should see (if you have any posts with that category published) a list of just those posts.  If so, perfect.  If not, then you either have not posted anything with that category or you have not set up the category correctly.  Go back and check.

Now, I need to set up the other site, in this case my laboratory site, to monitor my personal site, and any time something is posted, grab it.  Go open your other site and make sure the plugin is installed.  This site will "Pull" the posts from the original site.  Click on Syndication in the bottom left panel and you will open the settings page.  

The syndication settings page.

In the "New Source" box, paste in the category address from your other site.  In my case I pasted in

https://rodneydyer.com/category/dyerlab.

You will be taken to a verification screen where you can verify that things are working properly and select the correct feed type.  There is a 'verify' link that you can use to make sure it is providing good input.  After you select which kind of feed you want, you will be redirected back to the list, as above, but with your new feed in it.

Success

Now, when I write something (like this post) on my site, it will automagically show up on my laboratory site as well. The Cool thing is that wherever it is displayed, it is reformatted to look as if it belonged at that location.  Here is this post on my personal site.

and on my laboratory site

are identical in content, though are individually styles.  Pretty cool!

Featured image bytes amattox mattox (CC BY-NC 2.0).

Syndication

Syndication is a process whereby you can post something to your site and other locations will detect that you have posted something and then pull in the content to their site, making it look like you wrote it and posted it on their site.  Is that clear?  Here is my use-case:

  1. I post everything I do to rodneydyer.com.
  2. For things that I want to be shown on my work page (https://dyerlab.org), I select a particular Category or Tag for the post.
  3. My work site monitors and any time something at rodneydyer.com comes up with the key Category and/or Tag, dyerlab.org pulls the content of the post in and formats it to look just like I wrote it for that site.

This is particularly interesting for teaching and other uses.  If a class uses WordPress for its webpage, students can provide content for that class page by publishing on their own site.  This allows each student to create a "Digital Portfolio" of work that they maintain (see my thing on Content Silos for more on this).

FeedWordPress Plugin

I'm going to use the FeedWordPress Plugin for this because it was the one that my university uses and I want to standardize the approaches.

To install it, go to Plugins->Add New and search for it.  Install & Activate  .I'm going to use a new Category, named Dyerlab, to trigger the syndication.  So I add a new one.

A new top-level category to handle the syndication.

OK, now on my personal page, I have a category "Dyerlab" that I will attach to things that I want to show up on my Dyerlab WordPress site.  To make the connection, we need to get the category feed address.  Unless you are changing something drastic it has the following structure:

[crayon-5b5424405044c356052084/]

which in my case is:

[crayon-5b5424405045f755610911/]

You can try it out and you should see (if you have any posts with that category published) a list of just those posts.  If so, perfect.  If not, then you either have not posted anything with that category or you have not set up the category correctly.  Go back and check.

Now, I need to set up the other site, in this case my laboratory site, to monitor my personal site, and any time something is posted, grab it.  Go open your other site and make sure the plugin is installed.  This site will "Pull" the posts from the original site.  Click on Syndication in the bottom left panel and you will open the settings page.  

The syndication settings page.

In the "New Source" box, paste in the category address from your other site.  In my case I pasted in

[crayon-5b54244050463398956127/]

You will be taken to a verification screen where you can verify that things are working properly and select the correct feed type.  There is a 'verify' link that you can use to make sure it is providing good input.  After you select which kind of feed you want, you will be redirected back to the list, as above, but with your new feed in it.

Now, when I write something (like this post) on my site, it will automagically show up on my laboratory site as well.

Featured image bytes amattox mattox (CC BY-NC 2.0).

Capturing contents within Curly Brackets

OK, just a quickie here. I’m working with a colleague on a manuscript using LaTeX.  The citation formatting for the journal we are looking at uses the numerical citations but bibtex will number the citations by the order in which the \bibitem  values they occur in the bibliography section.  So, it would be great to get them to be in the order in which they occur in the text.

So, our old friend (and sometimes enemy) grep comes to the rescue.  Here is a quick one-liner that allows you to search the text for all the \cite{}  entries and return only the contents within the curly brackets.

Once all the editing is done and we’ve finished on the main body of the text, we can reorder the bibliography section and the numbers will be incremental.

Sometimes I forget how awesome and powerful the unix underpinnings are.