Patrick's Programming Blog

Use Segments to View Historical Data without Spam

Multiple data filters

Multiple data filters

If you’ve had your site for a while you’ll likely have experienced spam in your analytics. I know I have.

Google Analytics Language Spam

Even if you add filters to your site those only work moving forward. They don’t change anything retroactively.

So if you want to look at your historical data without spam we’ll have to use something other than filters. We're going to use segments.

Adding a Segment

To start we have to add a segment. I have one of my reports open and I can click on Add Segment.

Start by adding a segment.

Let’s give it a name. I’ll call it All Users (Not Spam).

We're going to configure out segment like so.

Scroll down to Conditions.

Set Include to Exclude since we want to remove any traffic that's spam.

Exclude Spammy Languages

Click on the drop down and select Language.

Then for the middle drop down select matches regex – which stands for regular expression. We'll be using a regular expression to match spam.

In an earlier post I created a regular expression for language spam. You can see how I did it there or you can just copy and paste this: \s+

Now we’re going to click OR that will let us add additional conditions.

Exclude Crawler Spam

Let’s check for crawler spam.

Select Source in the left drop down. And then matches regex again in the middle.

And then for the textbox copy and paste this in:

(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|\-gratis|semalt|forum69|7make|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|\-crew|uptime(bot|check|\.com)|datract|hacĸer|ɢoogl|responsive\-test|torrent\-to|magnet\-to|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way

Credit: Carlos Escalera put together ^ the above list.

And then click OR.

Exclude Hostname Spam

Lastly let's get hostname spam.

Select Hostname in the left drop down. And then matches regex one more time.

And then in the text box use the hostname filter I created for an earlier post: ([a-z].vitaly\.com|o-o-11-o-o\.com|nmrk\.ru|co\.lomb\.co|localhost|UA-8379211-12|127.0.0.1)+*

And then you can click Save.

Results

Now you might not see a huge difference. So I recommend looking at the past year. I went through the past year and found the a three week period where they hit me pretty hard.

But these couple weeks spam hit me hard. It was accounted for 12% of my traffic.

In addition, the spam made me think there were significant bumps. When in fact my traffic was pretty consistent this whole time.

I hope your analytics weren't hit as hard as mine were. It's very disappointing. And impossible to get rid of completely. Do put in place some filters if you haven't already.

And when you need to look at your historical data you can use this segment and you'll be fine. 🙂

Happy analyzing!

Exit mobile version