Running a WordPress or WooCommerce site can attract more than just customers. Crawlers and bots hit your site every day—some are legitimate search engines (Google, Bing, Yandex), but many are aggressive scrapers that eat up bandwidth, slow down your store, and generate no real traffic.

This guide shows you how to:

  • Diagnose the worst 20 crawlers in your logs.
  • Verify if they are real or fake.
  • Block them with AIOS (All-In-One Security) if you use WordPress.
  • Block them with .htaccess if you don’t run AIOS.

Step 1: Diagnosing Problem Crawlers

First, check your access logs. If you are on a cPanel/WHM, CWP (Rocky/Alma/CloudLinux) server, you can view live requests with this command:

Code Block:

tail -f /usr/local/apache/domlogs/yourdomain.com.log

Note: This log path (/usr/local/apache/domlogs/) is specific to CWP/cPanel/WHM servers (also known as cPanel/EA/CEPs).

  • If you are on a plain Apache or Nginx server, logs are usually under /var/log/apache2/ or /var/log/nginx/.
  • For managed hosts (like Cloudways, Kinsta, etc.), check their dashboard or support docs for the correct log location.

Look for repeating requests with strange User-Agents.

The worst 20 crawlers usually include:

  • GPTBot (AI scraper)
  • BLEXBot (SEO crawler)
  • MJ12bot (link indexer)
  • SemrushBot (SEO tool)
  • AhrefsBot (SEO tool)
  • DotBot (scraper)
  • Bytespider (ByteDance / TikTok crawler)
  • PetalBot (Huawei)
  • meta-externalagent (Facebook crawler)
  • archive.org_bot (Wayback Machine)
  • SEOkicks-Robot
  • DataForSeoBot
  • SearchMetricsBot
  • Linkdexbot
  • Trendictionbot
  • ZoominfoBot
  • MegaIndex
  • Sogou
  • Qwantify
  • Fake Googlebots

Step 2: Checking if a Bot Is Real or Fake

Some bots (especially Googlebot) are often faked. To confirm legitimacy:

  1. Find the suspicious IP in your logs. Example: 66.249.76.134.
  2. Run a reverse DNS lookup.

Code Block:

host 66.249.76.134

It should return something ending in .googlebot.com or .google.com.

  1. Run a forward lookup on that hostname.

Code Block:

host crawl-66-249-76-134.googlebot.com

It must resolve back to the same IP.

  • ✅ If both checks pass, it’s a real Googlebot.
  • ❌ If not, it’s a fake bot.

Step 3: Blocking Crawlers in WordPress with AIOS

If you run WordPress, the All-In-One Security (AIOS) plugin is the easiest solution.

  1. Go to Dashboard → All-In-One Security → Firewall → Internet Bots.
  2. Enable:
    • Enable User Agent Blacklist
    • Block fake Googlebots
  3. Add the worst offenders into the User Agent Blacklist.

Blacklist Example:

GPTBot
BLEXBot
MJ12bot
SemrushBot
AhrefsBot
DotBot
Bytespider
PetalBot
meta-externalagent
archive.org_bot
SEOkicks-Robot
DataForSeoBot
SearchMetricsBot
Linkdexbot
trendictionbot
Qwantify
ZoominfoBot
MegaIndex
Sogou
ClaudeBot
Barkrowler

AIOS will now return 403 Forbidden for those bots.


Step 4: Blocking Crawlers via .htaccess

If you don’t use AIOS, you can block crawlers directly at the server level with .htaccess.

Add this snippet before the WordPress rules:

Code Block:

<IfModule mod_rewrite.c>
RewriteEngine On

# Block worst crawlers
RewriteCond %{HTTP_USER_AGENT} (GPTBot|BLEXBot|MJ12bot|SemrushBot|AhrefsBot|DotBot|Bytespider|PetalBot|meta-externalagent|archive\.org_bot|SEOkicks-Robot|DataForSeoBot|SearchMetricsBot|Linkdexbot|trendictionbot|Qwantify|ZoominfoBot|MegaIndex|Sogou) [NC]
RewriteRule .* - [F,L]
</IfModule>

This forces a 403 Forbidden for those bots while letting humans and real search engines through.


Step 5: Testing Your Block

You can test whether a bot is blocked using curl.

Test a bad bot (should get 403):

curl -I -A "BLEXBot/1.0" https://yourdomain.com/

Expected result:

HTTP/1.1 403 Forbidden

Test a normal browser (should get 200):

curl -I -A "Mozilla/5.0" https://yourdomain.com/

Expected result:

HTTP/1.1 200 OK

Final Thoughts

Aggressive crawlers can silently drain your bandwidth and cause WooCommerce pages to load slower for real customers. By:

  • Diagnosing bots in your server logs
  • Verifying real vs fake Googlebots
  • Blocking with AIOS in WordPress
  • Or blocking with .htaccess on non-WordPress setups

…you can protect your site, save resources, and keep your SEO intact.

👉 Pro tip: Review your logs weekly and update your blocklists. New crawlers appear all the time, and staying proactive prevents wasted bandwidth.

Last Update: October 10, 2025

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,