Crawler spam is what happens when a bot crawls through your site and leave fake data. They might leave a fake referral just to get you to check out the referring website.
In fact I've had this exact thing happen. I was looking at my analytics and thought someone mentioned one of my articles on Reddit because over a few day period I got several hundred visits. I wasted my time digging through analytics to try to find the exact post.
And in the end I came to this page:
Reddit has already cleaned this up. They reported this as spam and it's pretty obvious now. But if you got hundreds of hits from this page wouldn't you be curious? Wouldn't you click the link to see how this relates to your site and the surge in traffic?
The whole goal is to leave fake referral data so you click on one of the spammy links. Crawler spam like this dirties your data and will influence any decisions you make based off of that data. We want to prevent any of this from actually being recorded so you don't click on spam links and so you can make smart decisions with your data.
Create a Filter
This process is going to be very similar to creating a filter for language spam. So I'll abbreviate the steps here:
- Log into Google Analytics
- Pull up a report for one of your views
- Click on Admin
- Click on Filters
- Click + Add Filter
- Fill in the following fields (see below for the Filter Pattern)
- Click Save
Now the Filter Pattern is so long that we actually have to break this into two filters. So copy the first one in. And then repeat the steps for a second filter.
(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|semalt|forum69|7makemon|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|uptime(bot|check|\.com)
datract|hacĸer|ɢoogl|responsive\-test|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website|checkpagerank|seo\-2\-0\.|platezhka|timer4web|share\-buttons|99seo|3\-letter|top10\-way
I have to give kudos to Carlos Escalera for compiling this list.
The Challenge With Crawler Spam
Crawler spam is really hard to detect. It can look identical to a browser requesting information and they can send identical data to Google Analytics.
The only way to filter out this data to is use a list of known spam website referrers which is what we're doing above.
The downside of using known spam websites is that spammers can keep making new ones and your filter won't catch them. It can feel a bit like whack-a-mole.
The good news is that while crawler spam is hard to prevent it's less common than you think. It requires a lot more resources than ghost spam, where a program sends information directly to Google Analytics without actually crawling your website.
Don't worry about preventing 100% of crawler spam. It's impossible. But at least by filtering out the most common known sources you're going to drastically reduce it.
Verify Your Filter
It's always a good idea to verify your filter. Make sure you don't have a typo that will eliminate legitimate data.
You can do this before you press the save button. You should either see a list of spam being filtered out, or since the verify button uses a small subset of data you might see the following error message:
This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.
As long as you don't see legitimate data you're good to go.
Happy filtering!
have you ever seen GA spam that appends a prefix or suffix to a valid hostname?
On my All Pages report, I’m seeing stuff like:
cdc10-www.valid-hostname.com
http://www.valid-hostname.com.pk
Another view I have of the same site only shows the sub-directory of the URLs (starting with / ) and it isn’t showing any of these.
Any help would be appreciated!
Vic
That spam hasn’t hit my site. But you can write an inclusive hostname filter. I tend to prefer exclusive filters. But inclusive work great if they are specifically targeting you.
Hi Patrick, these posts you’re doing on analytics spam are very useful. With regards to crawler spam I’ve used a few 3rd party services in the past: referrer-spam.help, paveiq.com/referrer-spam-remover/ (both free), and also analytics-toolkit.com.
You have to allow access to your account but then they add new filters as and when.
Have you had any experience with these? Of course one is relying on them keeping their eye on the ball.
When I was helping WooCommerce with this we used a 3rd party to setup & configure our analytics. I like the idea of a service.
I’m using these filters on a small number of accounts but if I wanted to update anything across all accounts that would be super useful. Honestly for $35 a month you only need 1 consistent client for something like analytics-toolkit to be worth it.