How to Block Bad Bots IP Addresses with .htaccess
Not all website visitors are human (human traffic). Sometimes Robots can also visit the website. Like User Agents, Crawlers, Bots and Link Scrapper. The robot is designed with certain algorithms for scanning and scrapping a website. But not infrequently, the scanning or scrapping process will cause an overload on the server resource.
As a result, our website server becomes an error, resulting in 502 Bad Gateways , 508 Limit Reached or 500 Internal Server Errors (not available). To overcome this, we need to learn how to block Bad Bots IP Addresses using the .htaccess file.
What is that Bad Bots, User Agent Bot, Crawler, Link Scrapper
Bad Bots are used for a variety of different purposes, such as scanning, scrapping, DDoS attacks, account takeovers, and many more. Bots can also distort the traffic you get from search engines, make metrics wrong and sometimes cause damage to the system (overload).
Websites with tens to hundreds of thousands of visitors per day are very vulnerable to badbots .
The most common cases are crawler bot and link scrapper. Where is this Crawler will continuously browse all pages until the end of the website. In fact, images and files are also not scanned by Bot Crawler. For details, try to look at the following schema web crawlers:
While crawling, the bot will do scrapper. Scrapping is what will later be a burden on website resources. He will deliberately access all files for later copying and gathering data to their server.
In that case, we can say Bots as User Agent. Where Bot is programmed by a group of institutions, for example Ahrefs, Semrush, Moz, and so on.
Why Block Bad Bots?
- Causes confusion with traffic analytics metrics
- Sometimes cause Overload Resource Server
- Drain server bandwidth
- Scrapper can open a website backlink
- Website security is very vulnerable
- Prone to spam and advertisement harm
- Bad Bots does not respond to Rule in Robots.txt
How to Block IP Addresses and Bad Bots with .htaccess Files
To block the Bad Bots IP Address via .htaccess, you need access to the website file manager. If the .htaccess file is not yet available, please just create the file manually. But if the file already exists, you just need to edit and add the code.
- Buka CPanel, Plesk Panel, ISP Config atau via FTP File
- Next, go to the website Root folder
- Find the .htaccess file, open it and make an edit
4. Add the following code at the top:
#Remove or add more rules as per your needs.
BrowserMatchNoCase “Baiduspider” bad_bots
BrowserMatchNoCase “BLEXBot” bad_bots
BrowserMatchNoCase “SemrushBot” bad_bots
BrowserMatchNoCase “AhrefsBot” bad_bots
BrowserMatchNoCase “DotBot” bad_bots
BrowserMatchNoCase “MJ12bot” bad_bots
BrowserMatchNoCase “Rogerbot” bad_bots
BrowserMatchNoCase “aiHitBot” bad_bots
BrowserMatchNoCase “spbot” bad_bots
BrowserMatchNoCase “oBot” bad_bots
BrowserMatchNoCase “DeuSu” bad_bots
BrowserMatchNoCase “ia_archiver” bad_bots
BrowserMatchNoCase “ExaBot” bad_bots
BrowserMatchNoCase “Sitebot” bad_bots
BrowserMatchNoCase “Gigabot” bad_bots
BrowserMatchNoCase “MetaURI” bad_bots
BrowserMatchNoCase “FlipboardProxy” bad_botsOrder Allow,Deny
Allow from ALL
Deny from env=bad_bots
Deny from 126.96.36.199/24
5. Meanwhile, to block IP Address type the following code: Deny from IPADDRESS
6. You can also block IP Address Range with IP Range To CIDR
7. To get the BOT User Agent name, you can open the Website Logs
However, not all bots are bad guys. Social media bots (Facebook, Twitter, et al.) And search engine bots (Google, Bing, Yandex et al.). Never block User Agents from Google, Bing, Social Media and the like. So the tutorial How to Block Bad Bots IP Address with htaccess . May be useful!