Extracting Email Addresses

So email can be a pretty seamless tool for most of us to communicate with others; sitting at a computer, it’s easy to create a message, send it, and when received, to read. That’s not even worth saying.

And I’m not here to make a commentary on how inefficient email can be in the workplace when there are better productivity tools (that’s another blog post).

We recently received a lot of emails and at work we respond to each one. In this case, the emails were not unique. An email campaign had been mounted and many of the emails were the same; when there were variations, they’d come from the same source. In the body of each email was the originator’s email address.

My job was to hand harvest these emails and make a list so that we could send to each constituent a response that acknowledged receipt and talked about next actions. However, as I began this work, I noticed there were quite a few emails. Over 500.

Recognizing Email Addresses

The first step toward automating this process is to get your computer to recognize an email address. Such a search can be done using Regular Expressions. I am not a whiz at these and I really need to spend a weekend to study them more; all I knew going into this was that they were powerful and I am sure someone had needed to do this same type of search before.

I use a Mac, which runs on UNIX, and comes built in with text processing tools. One such tool is called grep, short for global regular expression print, and it allows you to search using a regular expression in a file. The following command, flags, and the regex worked for me:

grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}\b"

This is the first part. I also need to pass it a file to search, and that required a little more preparation.

Preparing the Emails to be Searched

We use Gmail, through the web browser. Which is not helpful, as far as I know, for doing any type of regex searches. The UNIX mailbox format (.mbox) is a text file and can be searched. While there are guides online for exporting your entire Gmail account to a single .mbox file, what I needed were just a subset of the received emails.

  1. First, you’ll want to apply a label to the emails in question. This allows you to sequester them. I would choose a unique label for your extraction process. You can use the checkboxes in Gmail to check all the emails then use the tools at the top to apply the same label to all those emails.
  2. For me, I next selected all the messages in that label group (check the box at the top of the page after selecting the label on the sidebar, and choose the option to select all the messages with that label). I forwarded those emails to my personal email account. Why? Because on my Mac, I use Apple Mail. And within Mail, you can go to Mailbox > Create New Mailbox.
  3. Once the email came in from Google, I dragged the email with all the attached messages into the new Mailbox. Then I Command-clicked on that mailbox in Mail’s sidebar and exported the mailbox. Now I had an .mbox folder.
  4. Inside the folder I now had a single file with all the emails. Including those precious email accounts.
  5. After pasting the grep command into the Terminal, put a space after the ” and then drag over the file from the Finder into the Terminal window. The path to this file magically appears.

Finishing the Task

Once you have the emails and you’re back in the Terminal, we’re almost done. I do not want to see the email addresses on the screen, I want them in a file. To do that, we have to tell the grep command where to put the output. I also want to add some extra tasks before it does so.

| sort | uniq -i > addresses.txt

So put a space before that pipe symbol. The pipe is like a formatting engine. Sort alphabetizes the data and uniq gets rid of duplicates. The -i is a flag for uniq. The > symbol allows you to output everything to a file, and I called the file addresses.txt. So my entire command looked something like this:

grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}\b" /Users/jhendron/Desktop/messages.mbox/mbox | sort | uniq -i > addresses6.txt

The resulting file had more than what I needed to use, but by sorting the list, it made quick work of tidying up the list before I could pass it on to be used to compose responses.

I hope this might be of use to anyone trying to respond to a lot of emails at once using a common response.

Categorized as guides