How Nexcess limits bad bots

How Nexcess limits bad bots

Overview
At Nexcess, we block and restrict bots that tend to create performance issues for clients’ websites, although you may ask our support team to grant access to restricted bots as necessary.  

What are bots, and why do I care?

Over half of all web traffic is caused by web robots, commonly known as bots. Also known as “spiders” or “crawlers,” these automated scripts crawl virtually every page on every site on the Internet to gather as much data as they can.

Good bots benefit your site and do not noticeably affect its performance. Typical examples include commercial crawlers, search engine crawlers, monitoring bots, and feed fetchers, but any of these can qualify as bad bots if they hog your system resources and degrade site performance.

  • Search engine crawlers collect information for search engines to help them rank their results.

  • Commercial crawlers perform authorized data extractions to generate analytics and SEO data for companies tracking trends in eCommerce.

  • Feed fetchers carry your content to mobile and web applications. Some examples include Facebook Mobile App, Twitter Bot, and Android Framework Bot.

  • Monitoring bots check your site for availability and functionality.

Bad bots slow down or even crash your site. Some are well-intentioned but grossly inefficient. Many are malicious and even attempt to impersonate legitimate human traffic. They may scrape your site for email addresses (spambots), pull content to use elsewhere without your permission, or perform other actions harmful to your site and its visitors.

How we limit bad bots

One traditional way of limiting bots involves editing your site’s robots.txt file, which theoretically sets rules for all bots to follow. However, one prominent characteristic of bad bots is they ignore this rule, making it unreliable.

For our clients, our default solution is to brand each bot with one of three labels: whitelist; graylist, or blacklist. We do not block or limit known good bots; only bots known to be abusive, malicious, or of no meaningful value are added to our graylist or blacklist.

  • Whitelist bots function without limit. They benefit your site and do not noticeably hamper performance.

  • Graylist bots perform useful functions, but can crawl too aggressively, tie up your system resources, and slow down your site. Often, they ignore robot.txt rules. We rate-limit these bots, which slows their activity but allows them to function.

  • Blacklist bots offer little-to-no redeeming value. They tend to disrupt your site, act as a vector for attack, or both.

We can tailor these lists as needed. If we are blocking a bot that you need for legitimate purposes, or have identified a whitelisted bot causing excessive traffic or other issues, please contact our 24/7 support team for assistance. 

Identifying graylisted and blacklisted bots in your logs

In your Apache transfer logs, graylisted bot requests return HTTP code 429, and blacklisted bots return HTP code 400.

customers can identify in their Apache transfer logs greylisted bot requests with HTTP return code 429 and blacklisted bots with code 400

 

For 24-hour assistance any day of the year, contact our Support Team by email or through the Client Portal.

Article Rating (1 Votes)
Rate this article
  • Icon PDFExport to PDF
  • Icon MS-WordExport to MS Word
 
Attachments Attachments
There are no attachments for this article.
Related Articles RSS Feed
How to install Invision Power Board
Added on Fri, Feb 15, 2019
How to activate two-factor authentication in SiteWorx
Added on Fri, Dec 14, 2018
How to reset your SSH password and add SSH keys in SiteWorx
Added on Mon, Dec 17, 2018
How to transfer files using WinSCP
Added on Fri, Aug 15, 2014
Installing phpMyChat Plus
Added on Fri, Dec 20, 2013
How to use two-factor authentication in the Client Portal
Added on Wed, May 30, 2018
How to install OpenVPN
Added on Wed, Dec 26, 2018
How to view your Account ID and PIN
Added on Thu, Jan 24, 2019
How to enable SSH access
Added on Mon, Dec 17, 2018
How to secure your WordPress site
Added on Wed, Dec 26, 2018