SAS: best practices for working with datasets and libraries

The way SAS handles datasets and libraries of datasets is fiddly, and can cause big problems with analysis if not handled properly. Here, I explain briefly what SAS is doing behind the scenes and how I avoid common pitfalls.

Datasets in SAS

A SAS dataset is essentially a glorified Excel sheet (each variable is a column; each record is a row). SAS saves datasets in folders somewhere on your computer (more on that below). These files have an .sas7bdat extension. The name of the dataset in SAS matches the beginning part of the filename (the part before the .sas7bdat).

Libraries in SAS

A library is a collection of multiple datasets. It maps to a folder on your computer with .sas7bdat files in it. You can give the library a short alias that's 1 to 8 letters long. You do this by placing libname baz "c:\path\to\folder"; at the top of your SAS Editor file.

In the screenshot above, the first line of the Editor window is telling SAS to look in c:\path\to\folder whenever the library "baz" is referenced. On the third line, data=baz.foo is telling SAS to look in the folder with the alias "baz" (that's c:\path\to\folder) for a file called foo.sas7bdat.

The default library (work)

Perhaps the most confusing thing in SAS is what happens when you don't specify a library name when referencing a dataset. SAS has a temporary library it calls "work", and this is used whenever you don't specify a library. Datasets in "work" are deleted every time you restart SAS.

This makes the following two editor windows exactly equivalent.

This is convenient but confusing, because proc contents data=foo; and proc contents data=baz.foo; are referencing two entirely different datasets.

Note that you can see the library and name of the dataset when you run proc contents;:

Best practices

Here's how I avoid confusion about datasets and libraries in my own code:

  • I always use the full "libname.datasetname" syntax so there's no ambiguity about whether I'm using "work" or my own manually defined library.
  • I only use "work" for something that's truly temporary. Anything I want to reference in a later proc, I save to a manually defined library so it won't disappear if I close SAS.
  • If I create a dataset based on an existing dataset, I prefix the name with "drv" like this. This stands for "derived", which tells me that it's based on another dataset and could theoretically be recreated by running my code again. In contrast, my raw datasets with no "drv" prefix can't be reproduced by running my code.
  • I use descriptive names for my derived datasets. "drv_foo1" and "drv_foo2" are not particularly helpful when trying to remember what has changed between them. "drv_foo_drop_missing" is much better.
  • SAS will allow you to run proc whatever; without explicitly specifying a dataset. I never do this because it makes it ambiguous which dataset I'm using.

An example

If I was doing a Broad Street Pump analysis, I would make a folder called c:\projects\broadstreet where I would save all my SAS Editor (.sas) files.

I would also make a subfolder called c:\projects\broadstreet\data where I would point my SAS library at with libname broadst "c:\projects\broadstreet\data"; at the top of my Editor file.

If my primary dataset was called "primary", then I would expect there to be a file on my computer called c:\projects\broadstreet\data\primary.sas7bdat.

I would then use "broadst.primary" to reference this dataset in my procs. For example: proc contents data=broadst.primary;.

How to enable old-style listing output in SAS 9.3 and best practices for sharing SAS output

Starting with version 9.3, SAS has switched to using the "new" ODS HTML-style output in favor of the "old" listing-style output format. While the new format is arguably easier to read, it may be more difficult to copy/paste into documents or emails.


How do you get the old-style listing output in SAS 9.3?

Add the following line to the very top of your SAS Editor file:

ods listing; /* Turn on old style output */


How to you avoid weird ƒ characters when copying/pasting listing output?

SAS uses a special font for displaying lines in listing output. If that font isn't available or isn't selected, you get strange characters instead of lines.

There are two ways to fix this. The first is to paste your SAS output into the form on https://sasfix.herokuapp.com, and then paste from the website into your document/email. This looks the best but requires an extra step.

The second is to add the following line to the top of your SAS Editor file. It doesn't look as good but it saves the extra step of the first method.

options formchar="|----|+|---+=|-/\<>*";

Here's what both look like:


Remember to use a monospaced font

If you paste SAS output into Microsoft Word or a similar editor, you should use a monospaced font like Courier New.

MailMate: select previous (older) message on archive

If you want to mimic the Gmail behavior of selecting the next older message after archiving (MailMate by default selects the message above the one you archived if you're sorting from newest to oldest), you need to toggle this hidden preference:

defaults write com.freron.MailMate MmMessagesOutlineMoveStrategy -string 'previous'

This is posted in the MailMate bug tracker, but I wanted to post again here to potentially enhance googleability.

Update: this is also posted on the official documentation page for hidden preferences.

MailMate: add "open in MailMate" link to FastMail's web interface

Search in MailMate works great in terms of accuracy, but is a bit slow with ~65,000 messages (searching "common headers or body", which is the most resource-intensive search option, but I prefer to search everything from one field a la Spotlight). Search in the web interface for FastMail (my email provider) is both accurate and very fast. So why not use FastMail for searching and MailMate for everything else?

To this end, I made a quick-and-dirty Chrome extension for adding a "Open in MailMate" link to FastMail's web interface.

No guarantees this will work for you because it's quite a hack. It has to load the raw message via an XHR request and then find the "message-id" in order to build a message:// URL that will trigger MailMate.

When MailMate opens the message, you can use the Message > Go to Source menu item to show the message in context (possibly followed with showing all messages in a thread).

Head over to GitHub to grab the extension.

MailMate: combination of conversation and thread arc views

MailMate has an unsupported feature for creating custom layouts. I quickly created a combination of the two best (IMO) layouts: the conversation layout and the thread arc layout.

Here's a screenshot:

Note:

Here's the code:

MailMate: keyboard shortcut to archive all messages in a thread

One arguably strange behavior in MailMate is that when you archive a message, other messages in the thread can stay in your inbox.

Fortunately, it's possible to re-map the archive keyboard shortcut to archive all messages in a thread. Using the instructions for adding custom keybindings, add the following:

"y" = ( "selectWithFilter:", "#thread-id = ${#thread-id}", "archive:");

Note that I use "y" instead of "e" because I use FastMail instead of Gmail.

Ulysses III RTF manuscript style (.ulss)

Ulysses III has an awesome feature for styling RTF exports of Markdown documents. I created a .ulss stylesheet file for a double-spaced manuscript. It's still a work in progress – it's not specific to any journal's requirements currently, but it certainly could be.

Click here to view the gist containing the .ulss file (too long to embed).

Here's a sample file in Ulysses:

And here's the direct export in Word:

I haven't finished playing around with embedding figures or inserting citations. Citations are obviously huge for academic writing, and the new Papers 3 is awesome but doesn't seem to fully support Ulysses. Part of the problem seems to be that the Papers 3 popup citation inserter can't generate the references list inside Ulysses. The other part of the problem is that by default, Ulysses does crazy parsing stuff to the Papers citation placeholder tags (e.g. {Smith:2013xx}).

I've got some work-arounds but they are too fragile to be worth writing about at this point. Even if the workflow is a little jury-rigged, it will still probably be better than WYSIWYG formatting with Pages or Word – it's amazing to have completely uniform formatting without futzing around with manually applying styles.

I'm asking for some help...we'll see what happens: