Mining Online Reviews for Fun and Profit

As an interpretive planner and visitor experience advisor, I’m a big believer in evidence-based decision making. When I start a new project, the first thing I ask for is data: visitor surveys, gate revenue statistics, comment cards… anything I can get my hands on. 

Recently I started an exhibit planning project where my client knows next to nothing about their visitors. It’s a busy place, with visitation in the hundreds of thousands per year, but the visit is entirely self-guided and admission is free of charge. There’s no front gate data; virtually everything they know about visitation comes from a basic camera over the front door… which doesn’t correlate that visitation to time of day, day of week, time of year, nothing. It’s a statistical desert. 

So what’s a data-thirsty planner to do? I had a little extra time on my hands last week so I decided to turn my mad analysis skills to online reviews. It’s something I’ve been wanting to try for a while: dive into online reviews to answer the questions we need to know to do interpretive planning. Who are our visitors, what are their needs and interests, and what is the way to their heart? 

The Plan

I used three online review sites: Google, Tripadvisor, and Yelp. The latter two are not terribly relevant anymore, but their reviews are still online, and fortunately for me, my client’s visitor offer (exhibits and wildlife viewing) hasn’t changed much in recent years. Or in the last forty years, honestly, but that’s a story for another day. 

Gathering the data

If you’ve got thousands of data points to harvest, you can set up a data scraper. This is a kind of bot that pulls the HTML data off a website and organizes it for you in such a way that you can read it and work with it. It’s entirely legal but requires a bit of time and technical savvy to set up; I only wanted about 400 reviews to work with so I decided to kick it old school and simply note what I saw in each review. Clickity click, typity type. No, I have no social life whatsoever. Don’t judge me.

Trip Advisor was the juiciest data source for me because each review combines a star rating (out of five) with a text review, plus the reviewer’s point of origin, plus the reviewer’s group composition (solo, family, friends, couples.)

Raw data from online reviews
Reviews, tags, categories. The raw data.

Here’s what I did

I noted the above data in a spreadsheet in simple columns: Rating, Review, Group Type, Origin. I broke Origin down into just three categories: Locals, Canada but not Local, and Outside of Canada. Just by doing that, I got to a level of data analysis that you don’t get by just glancing at the website. I discovered that locals rate the site higher than international travellers do, and I started to imagine how we could work that into the plan, creating a relationship with locals that might take advantage of their love of the site to help drive advocacy and attendance. 

I also discovered that solo travelers rate the site higher than others; that wasn’t a big surprise to me as the exhibits are fairly information-heavy and geeky. (Yes I’m a solo traveler and yes I geek out when I go to interpretive sites.) We’ll need to de-geek the scientific content a bit, I think, to bring up the satisfaction for couples and friends. 

But I wasn’t done with the data. Oh no: here’s where it started to get interesting. 

For each review, I did a simple qualitative text analysis. It sounds pretty fancy when you put it like that, but it’s a simple process. Every time a review mentions a specific thing—the clean washrooms, for example—you tag it. I went crazy with tagging, and recurring themes started popping up. People were really positive about the free admission, for example, but negative on how hard the site was to find from the main road. 

I converted my basic spreadsheet into a relational database using Airtable, which is free and just so freaking easy to use. As my different tags (or codes as we call them in the qualitative analysis business) started adding up, I had the database enumerate them and weight them from most-frequently mentioned tags to the least. With this kind of analysis, you start giving purely qualitative data a bit of quantitative credibility: if my client says, “Hang on WHO SAYS we need to fix the wayfinding?” I can say, “56 out of 400 visitors did, and many of them are local families who are otherwise your biggest supporters.” Boom, chew on THAT tasty data, my friend.

Data codes from online reviews
The aggregated codes

What I learned

In addition to the visitor preferences I mentioned above, I discovered a few things about the site that I didn’t know at all. 

First, I discovered that from the visitor’s point of view there’s more to the visit than just our site. Many, many reviewers associated our little building in the woods with the hiking trails around it; it’s clear that our exhibit is part of a greater destination. My client never mentioned that at all in my scope of work, but I am now suggesting to them that we approach the parks agency that controls the forest and trails and set up a complimentary visitor offer. 

I also discovered that site is much more appealing to families with children than my client or I originally believed; the number of family visitors (reviewers who either self-identified as Family or mentioned “the kids had a blast” or similar in their review) was much higher than we expected. This will have a direct effect on our interpretive plan: hello interactivity. 

I learned that the non-local visitors tend to arrive as part of a guided bus tour that includes other regional attractions. Their stop at our site is generally 20 to 30 minutes including a bathroom break (the free, clean washrooms are what keep the bus companies coming.) So we’ll need a quick visitor flow for those clients, while we set up a more in-depth exhibit experience for the locals and their visiting families and friends. 

The analysis also confirmed what I felt when first visiting the place: the building’s forest setting, next to a steep river gorge, is jaw-droppingly beautiful and is memorable for both locals and out-of-towners. Documenting that fact will really help me make recommendations about how we promote the site when we get to the marketing phase. 

Magical, in a west-coast way

How long did this take?

This kind of data analysis takes a fair bit of concentration and I find I can’t do it all day long. I set up the timer on my watch for 20-minute bursts; then I put on some downtempo tunes and worked my way through it a bit at a time. All told, I estimate that documenting and analyzing 428 reviews took about eight hours spread over several days. 

How I will present the data

Some of the best advice I received from a statistician years ago was, “Use the data to tell a story.” I will lay out what I learned in narrative form, step by step and point by point, painting a verbal picture of the people who visit our site. I will use a new presentational model I recently discovered—the assertion+evidence model—to lay it all out with charts. I’ll try to give you a glimpse of that in my next article. 

One Comment

  1. Hah! I heard about your client looking into overhauling the interpretation of that site. So glad you’re working on this – it really does need an overhaul…

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.