Yuk… http { # ... other http settings limit_req_zone $binary_remote_addr zone=my...

diath · 2026-02-12T06:14:19 1770876859

Easier said than done, I have 700k requests from bots in my access.log coming from 15k different IP addresses.

:: ~/website ‹master*› » rg '(GPTBot|ClaudeBot|Bytespider|Amazonbot)' access.log | awk '{print $1}' | sort -u | wc -l

15163

reactordev · 2026-02-13T12:09:10 1770984550

    map $http_user_agent $uatype {
            default             'user';
            ~*(googlebot|bingbot) 'good_bot';
            ~*(nastybot|somebadscraper) 'bad_bot';
        }

You can also do something like this to rate limit instead of by IP address. Making all ‘bad_bots’ limited but not ‘good_bots’.

I’m not dismissing the difficulty of the problem but there are multiple vectors that can identify these ‘bad_bots’.

PaulDavisThe1st · 2026-02-11T20:32:28 1770841948

We used fail2ban to do rate limiting first. It wasn't adequate.

reactordev · 2026-02-11T20:59:07 1770843547

Ooof, maybe a write up is in order? An opinioned blog post? I'd love to know more.

PaulDavisThe1st · 2026-02-12T03:31:09 1770867069

As noted by others, the scrapers do not seem to respond to rate limiting. When you're being hit by 10-100k different IP's per hour and they don't respond to rate limiting, rate limiting isn't very effective.