Opportunities for Data Innovation are All Around Us

I just recently finished reading Sexy Little Numbers by Dmitri Maex. It’s a very timely and interesting look at some of the data trends that anyone can and should be accessing using the customer data and insights they already have around them. What was striking to me was the presence of statistical modeling in the book – the opportunity to yield useful and actionable information in the presence of data-sets that you probably already have in front of you.

The opportunity to use data creatively to yield useful results was absolutely present during our development of Dashter. And its worth thinking about within your business as well… What data do you have – and how can you use it to your advantage?

Here’s a couple examples from within Dashter where we developed greater value by tapping the data that we were already pouring in to.

Trending in Friends

One of the first things that a Dashter user sees on their homepage is the Trending in Friends pane. Within there, we provide 18 “trends” that are collected from a gathering of ~200 tweets from your timeline. So what does it do? Well, basically, it just adds up all the hashtags found in those ~200 tweets, and then shows you the top 18, in rank order from most-used to least-used (of the top results).

Why is this useful? Well – when exploring Twitter, its often hard to get a grip on all the data swirling around you. But there are two assumptions that can be made: 1) You follow people who say things that interest you, and 2) Events of significance often peak within groups. By organizing hashtags among people you follow, it’s easy to develop a personalized trend graph that refreshes every 15 minutes or so. That means instead of having to pour through dozens or hundreds of tweets to see what’s happening – you can merely glance at an ordered list of top-hits.

Translating this to Your Business

You likely have an overwhelming amount of data pouring in on any given day. If you’re running online lead campaigns, for instance, you likely use Google Analytics and/or your CRM platform to track and manage those leads. But perhaps you can do more? And perhaps you can devise a way to provide snapshot information that allows you to evaluate performance without having to look through every record.

Creating a chart that breaks down leads by hour – and simply counts them throughout the day (and groups them in to hourly bundles) could allow you to see when your incoming leads are peaking. Perhaps that could allow you to reallocate sales & marketing activities to accommodate surges and down-times. With an hourly breakdown of your results, you can put salespeople on the phone quickly as leads come in, and save administrative and managerial tasks for off-hours. The data to optimize your results is sitting right in front of you – it just takes a moment to design a view of information that is going to result in bottom-line revenue performance.

Archetwypes inside Dashter

Another data innovation that we created was Archetwypes. We wrote about them pretty extensively on the Dashter blog – and Jeff did some writing on it as well. That’s an example of information by extrapolation. The “stats” of an archetwype are simply collected (again from a pool of tweets – this time from a pool of tweets from a single twitter user). We used 4 “metrics” to form our assumptions about a twitter user: Tweet Frequency, % of tweets that were a reply to another tweet (inferring a conversation was underway), % of tweets that included an @mention to another user (inferring a person who was attempting to engage others in a one-on-one basis), and % of tweets that contained links (inferring a person engaging in conversation versus someone simply using Twitter as a linking hub).

If all we did was show those 4 metrics on a profile page, it would provide the beginnings of information – but severely lacking one crucial element: context.

Context is what makes data-driven information useful. In this case, we created a 2×2 grid model with 2 options per grid entry (4^2 = 16 archetypwes). That allowed us to build actionable information using Archetwypes – because now we could clearly identify for our users the different types of twitter users on the network. It wasn’t 100% accurate – but it didn’t need to be. This is another example of a snapshot: A way to develop guesses based on incomplete information.

Yes, we could have (in theory) scrubbed a users’ complete Twitter history and provided a complex report of their activities. But for our users, we simply wanted to provide a quicker way to say “should I follow this person or not?” Do these people behave in ways that you find appealing to engage with? If not, don’t follow them. Another byproduct of this methodology was the development of TwitShowdown – which you can take a look at separately.

What Can You do in Your Business?

So – again – what can this type of thinking do for your business and data? Making assumptions is a critical part of building actionable information. Going back to the sales / lead form example – you’re collecting a ton of data every time you reach out to your customers. Perhaps you can start to come up with your own metric model to better sort & organize your customers. For instance, maybe you break up the country according to region (or your sales area in to sub-regions). And perhaps a gender breakdown. Even basic data might be useful: Do your prospects have a @business.com domain name, or a more generic @gmail or @yahoo domain? Start bundling those sorts of results in a matrix model, and see if you can use past data to infer a potential positive or negative trend?

For instance, maybe when looking at your last 12 months of sales, you see that Men in the Northeast US who use a business email address are 3x more likely to purchase from you than women in the Southwest using hotmail? Now suddenly you can add that assumption in to your revenue forecasting – and also your advertising targeting. Those are just 3 basic metrics that might be able to give you a massive advantage in growing your bottom line – with essentially zero additional cost. It’s just an identification effort. You just need to start guessing at things to look for.

Beware False Logic

Okay, so all that being said, I think it’s fair to wrap up with a quick note – in Latin. Post hoc, ergo propter hoc. “After this, therefore because of this.” It’s a logical fallacy: Since something happened after something else, the second thing must have been caused by that first thing. Creating an archetype (or, in our case, and Archetwype) doesn’t mean that the person is actually a member of it. It simply means that for now you’ve decided which bucket they seem most likely to fit in. It doesn’t mean that they’ll tweet things you enjoy – it just means that over the last couple weeks, when they have tweeted, they’ve followed a pattern that fits our model.

In our sales & leads example, let’s say you discover that Northeastern men at work buy more of your product because your sales floor is in Connecticut, is filled with other men, and it’s a corporate product line that you’re selling. You haven’t stumbled upon the holy grail of your marketing campaign – you’ve just figured out that you’ve found the customer you’re most like. It doesn’t do much good to try to evaluate data based on things that are obvious, you already know, or have already adjusted to compensate for. You’re not going to get different results just by shifting the data to factors that you’re most comfortable with and you already know the answers to.

Don’t fall in to any self-made logical traps. Use the smallest possible data requirements (so you have the largest sample size to pull from your database) and the broadest alternative models. The success of Archetwypes was that we just relied on a coin flip for a very nuanced result. If more than half of your tweets are a reply to another one, we consider you a conversationalist of some kind. But we never differentiated between 51% and 98%. You have to pick a line somewhere. Do the same with your data. Build the simplest model you can, see how it matches up with your assumptions, and continue refining it.