Ubuntu server is a nice platform for server-related activities. Here is a short tutorial of how I updated my most current version to the latest available by rstudio.org. Here is how I got it going.
If this is your first install, you need to grab the gdebi stuff
sudo apt-get install gdebi-core
Next download the latest deb from rstudio. I typically like to try out the preview release, often stable enough to get what you want done while at the same time highlighting the latest features. When writing, it was the 1.2.1321 version.
This went out and grabbed some other libraries and installed everything for me then turned it back on. Since I had it already installed, that was the end of it. If this is the first time you are installing it, you can configure it following the installation guide here.
So as a way to expand some of the analytical tools we offer the students at my work, I’m developing a version of my Data Literacy course that will use Python as well as R. There is a lot of overlap in these two languages and both are of interest to our students as they develop their toolkits. This document walks through how to set up Pweave on your machine so you can engage in a little Literate Programming (trust me, it will make your life suck a lot less. To see how to set up Atom, see my previous post.
I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title Literate Programming.
Knuth – 1992
If you think of it a bit, as data scientists, the documents and manuscripts we work on every day are just extensions of programs and scripts we use to do our work. However, in academia we are taught the process in entirely the wrong sequence. Traditionally, we are taught the following sequence.
We’ve are funneled by the primary interface for writing scientific documents–the word processor–into that monstrous chunk of software we use to crafted our tales about the data we were presenting. How many times have we started working on a new project and the first thing we do is fire up a editor and start an outline of a manuscript? We never really liked it but this was the main tool we were taught to use (and the crappy reference managers tacked onto them).
In a separate interface, we would perform our analyses. In my career, I’ve used:
That VAX machine over in the Math Department at UMSL. It ran SAS and I did most of my work in IML.
One-off software packages that worked on our ‘special’ kind of data we are working with. These were typically FORTRAN code written by some wizard at a far-off university. Anyone remember BioSYS from Swofford & Selander?
Workarounds in C (my own popgraph software is written in C).
Extensions that could be shoved into Excel (GenAlEx is a good example of how far you can push VB).
Scripting languages such as R, Perl (no one uses this one any longer, which is probably a good thing), Python, Julia, etc.
Then we would export the raw output to some kind of plotting software to make your graphics. I always hated this step, because inevitably, we’d have to come back and redo the graphics (higher DPI says the publisher) and we’d have to remember how we made it that last time as most of these interfaces are stupid point-and-click software packages.
The main problem is that any iteration of the manuscript would require manually going through the process or changing the text document, rerunning the analysis, then replotting the figures. Move this section up her and then go back through and make sure all your figure and table references are recovered.
But this is entirely upside-down! Instead of Communicate -> Analysis -> Visualize, our workflow should be more like:
We should be data-focused, not manuscript focused!
The research manuscript is simply an advertisement of our research and the data, it IS NOT the research or data.
Dyer – Just now!
PWeave is like SWeave (and its better version Knitr) on R. It is a tool that we can use to interdigitate our analysis and how we go about presenting it all in one place. This allows us to have a single document where we can have the data, the analyses, the output, and the verbiage that we use to describe what we are doing. This tight coupling of the data to the rest of the components helps in Reproducible Research.
To install Pweave, you need to have atom and python already configured. Then in Atom, install the following packages
Next, you can prepare a short script. Here is a fragment of one.
What this does is mix in markdown text and code. If you have not used Markdown before, it is pretty straight forward. Here are some simple rules.
A line with one or more # marks are headings.
A word or bit of text between asterisks (e.g., *this*) are italicized.
A word or bit of text between pairs of asterisks (e.g., **this**) are bold.
Links are placed in parentheses with the option to have specific word to be the link. [link](http://foo.bar)
Lists are done physically, new line with dash for unordered or new line with number as numeric.
All the python code must be within the bounds marked by the three backslashes. The code will be evaluated, from the top of the document to the bottom. You do not have to show the code for it to run.
To weave the document into HTML (we can do other formats as well but this gets us going, open the terminal and type:
And it should produce a document in the same folder but as an *.html file.
Which is pretty cool. Now, there are a lot more things you can do with markdown.
Being new to the Windows platform, I’m on the look for a good text editor that can do the myriad of tasks that we do each day. Notepad is not an option, let’s be real. I’m looking for something that can be extended and has been designed from the bottom up for wrangling text and writing code. Ultimately, I would like something that is amenable to teaching both R and Python using a single interface. RStudio is great for R but sucks for Python. Juypter notebooks are clunky and toy-like.
Atom is created by the Github folks and is integrated into ‘the mothership’ repository. Here is what I did to get it up and running and having Python running correctly.
Packages are extensions to the main editor that accomplish some function to make you life a bit easier. Here are some of the ones I find helpful. You can find packages and install them using Settings -> Install. Then search for the packages and hit the install button.
If you have scripts and/or code that is longer than a single page (and who doesn’t) minimap provides a graphical depiction of your code on the right-hand side of the window to allow you to easily jump up and down the file. Here is an example on an R script.
Script is a package that runs code in the editor directly. This means you can run individual lines as you develop and look at the output. Very helpful.
There are a ton of themes, both overall for the editor as well as syntax highlighting, available. To install one, select Install -> Theme and type a name from Settings. Here is the atom-material-syntax being installed.
Once installed, you can change both the UI and the syntax colors.
As I expand more into using Atom, I’ll add additional posts showing how I have configured it for use in my daily coding activities.