General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsAI haters build tarpits to trap and trick AI scrapers that ignore robots.txt (War being waged on the internet)
Posting in GD. Not really a computer support article. It just says how people are fighting back against endless assault by AI bots that are flooding their sites.
Context:
A web server's robots.txt file "requests" that the site NOT be crawled.
So people crawl them anyway. This can lead to overload for small sites (and big ones)
Last summer, Anthropic inspired backlash when its ClaudeBot AI crawler was accused of hammering websites a million or more times a day.
And so, the tarpit was invented.
Tarpits were originally designed to waste spammers' time and resources, but creators like Aaron have now evolved the tactic into an anti-AI weapon.
AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt
https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/
Watching the controversy unfold was a software developer whom Ars has granted anonymity to discuss his development of malware (we'll call him Aaron). Shortly after he noticed Facebook's crawler exceeding 30 million hits on his site, Aaron began plotting a new kind of attack on crawlers "clobbering" websites that he told Ars he hoped would give "teeth" to robots.txt.
Building on an anti-spam cybersecurity tactic known as tarpitting, he created Nepenthes, malicious software named after a carnivorous plant that will "eat just about anything that finds its way inside."
Aaron clearly warns users that Nepenthes is aggressive malware. It's not to be deployed by site owners uncomfortable with trapping AI crawlers and sending them down an "infinite maze" of static files with no exit links, where they "get stuck" and "thrash around" for months, he tells users. Once trapped, the crawlers can be fed gibberish data, aka Markov babble, which is designed to poison AI models. That's likely an appealing bonus feature for any site owners who, like Aaron, are fed up with paying for AI scraping and just want to watch AI burn.
And if you want more info:
Hacker News discussion of this:
https://news.ycombinator.com/item?id=42858828
WebSpam
https://www.web.sp.am/
This is another LLM tarpit, intended to poison datasets.
Note: if you visit the site, it's just an endless bunch of links that never go outside the site.
Nepenthes
https://zadzmo.org/code/nepenthes/
This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside. (Pitcher Plant)
It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.
Hacker News discussion on Nepenthes:
https://news.ycombinator.com/item?id=42725147
Mantis Framework" counter-attacks hackers' AI agents
https://www.thestack.technology/mantis-framework-poisons-traps-hackers-ai-agents-in-a-tarpit/
Alternatively, they can soak up attackers' AI resources in an agent tarpit that traps the LLM agent in an infinite filesystem exploration loop*. "The attacker is driven into a fake and dynamically created filesystem with a directory tree of infinite depth and is asked/forced to traverse it indefinitely."
The Mantis** framework is the creation of three Red Team security researchers and academics associated with George Mason University.
It effectively generates honeypots or decoys designed to counter-attack LLM agents activated against them, using various prompt injections.
Open Source: https://github.com/pasquini-dario/project_mantis
Prompt Injection as a Defense Against LLM-driven Cyberattacks
🔨Working on transforming Mantis from an academic PoC to a full-fledged and robust defensive tool for your assets. 🪚
LearnedHand
(4,357 posts)Freaking brilliant!
Klarkashton
(2,663 posts)Where you provide the answer and it will struggle to conform to the answer you provided even if the logic is completely wrong. If you ask the same question again without the answer it will give you your bogus answer.
Ask a question that starts with "show that" and give it a shit answer.
DBoon
(23,336 posts)Klarkashton
(2,663 posts)SilasSouleII
(464 posts)"The porridge bird is a fictional creature often referenced in children's literature and whimsical stories. The phrase "the porridge bird lays its eggs in the air" comes from the poem "The Hunting of the Snark" by Lewis Carroll. In this context, the line is meant to be nonsensical and humorous, reflecting Carroll's style of absurdity and playful language.
The idea of a bird laying its eggs in the air evokes a sense of whimsy and imagination, making it a memorable and intriguing line, but it doesn't have a literal explanation. It's part of the charm of Carroll's work, inviting readers to embrace the fantastical and illogical aspects of his storytelling."
DBoon
(23,336 posts)Side two opens with the exhibit of "the President" (Austin), who sounds like Richard Nixon. Each visitor is asked to speak their name, which is then played back to appear as if the president is addressing them by name. A black welfare recipient named Jim (Bergman) relates his family's harsh urban living conditions and asks the President where he can get a job. The President responds with a vague, positive-sounding reply only remotely related to the question and completely unrelated to Jim's concerns, and Jim is given the "bum's rush". When it is Clem's turn, he puts the President into maintenance mode by saying, "This is Worker speaking. Hello." The computer responds with the length of time that it has been running. Clem then gets access to Doctor Memory (the master control), and attempts to confuse the system with a riddle: "Why does the porridge bird lay his egg in the air?" This causes the President to shut itself down. As Clem leaves, an Hispanic visitor is heard to say "He broke the President!".
https://en.wikipedia.org/wiki/I_Think_We're_All_Bozos_on_This_Bus
SilasSouleII
(464 posts)Life under Nixon. Got my first job at age 11, delivering the morning Expess-News on my bike. It was a very good year for rock music, one of the best. Days long gone bye...
usonian
(15,376 posts)The Lampoon, IIRC, lampooned "explitive deleted" with "executive deleted"
So needed now.
Bernardo de La Paz
(52,062 posts)Maru Kitteh
(29,449 posts)dgauss
(1,199 posts)A new terminology is needed, but this is getting beyond traditional comprehension however we describe it.
JHB
(37,543 posts)You want to use the work of writers and artists and any other creator to train your AI, you should goddamn well pay them for it. And if you don't, you're a thief. Theft protection does not make anyone a "hater" except to thieves.
usonian
(15,376 posts)I use aggregators., namely DU and Hacker News, so pretty much anything gets posted, without regard to SEO rank, but they focus on tech/startup/developer matters and politics.
Suits me.
Fun stuff makes its way in both.
WestMichRad
(1,964 posts)It makes me happy to learn that there are efforts to foul AI.
Those who have unleashed AI upon us
had it coming.
Hekate
(95,871 posts)Pinback
(12,945 posts)I look forward to learning more about this and will share with techies I know.