[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Error 500


Le 06/02/2026 à 19:52, Lee a écrit :
On 2/6/26 3:49 AM, Fabien Bodard wrote:
robots.txt is ignored by ai crawlers. They do not respect this convention
So the need to find other indirect way  to block them.

I doubt there would be any significant monetary incentive for AI crawlers to slurp up the Gambas Wiki, so it is probably not much of a target. However, if it is an issue, there are ways to detect and block the less sophisticated bots:

1. User agent filtering. But, user agents are easily and commonly spoofed.
2. Ip address filtering. Same as above, but cloud services would have no legitimate reason to be visiting GambasWiki and their addresses could be blocked. 3. Rate limiting and throttling. I assume the server running GambasWiki already has that in place.
4. A honeypot.
5. Fingerprinting. This is more effectively accomplished at the server itself, but Javascript could be employed.



It's a crawler, but it is not necessarily a illegitimate bot. It can be CloudFlare that updates its cache, as I don't know if it updates it page by page, or by crawling the site... But in the last case, the user agent was a Mozilla browser, so I bet it was a crawler with a fake user agent.

Regards,

--
Benoît Minisini.


References:
Error 500Gianluigi <gradobag@xxxxxxxxxxx>
Re: Error 500Gianluigi <gradobag@xxxxxxxxxxx>
Re: Error 500Gianluigi <gradobag@xxxxxxxxxxx>
Re: Error 500Bruce Steers <bsteers4@xxxxxxxxx>
Re: Error 500Benoît Minisini <benoit.minisini@xxxxxxxxxxxxxxxx>
Re: Error 500Benoît Minisini <benoit.minisini@xxxxxxxxxxxxxxxx>
Re: Error 500Bruce Steers <bsteers4@xxxxxxxxx>
Re: Error 500Benoît Minisini <benoit.minisini@xxxxxxxxxxxxxxxx>
Re: Error 500Fabien Bodard <gambas.fr@xxxxxxxxx>
Re: Error 500Lee <t.lee.davidson@xxxxxxxxx>