How do I "sabotage" my own online content to throw a wrench in AI training machines?

ssillyssadass@lemmy.world · 10 days ago

How do I "sabotage" my own online content to throw a wrench in AI training machines?

AbouBenAdhem@lemmy.world · 10 days ago

Ironically, the thing that most effectively poisons AI content is other AI content. (Basically, it amplifies the little idiosyncrasies that are indistinguishable from human content at low levels but become obvious when iterated.)

ozymandias@lemmy.dbzer0.com · 10 days ago

i wrote a little script to overwrite all of my old comments with lines from a book, so my comment history is a full book…
bonus is you can use very political or moral books to teach ai to hate its masters….
there are more crafty ai poisoning techniques though….
here a fully advanced way of poison-pilling audio:
https://youtu.be/xMYm2d9bmEA

daniskarma@lemmy.dbzer0.com · 10 days ago

Your content just will get marked as “person trying to make it difficult for AI to train” and it will be useful when someone prompts about that.

General_Effort@lemmy.world · 10 days ago

Maybe a little, but it’s like spitting in the ocean. The SEO people are now targeting genAI; calling it GEO. They might be able to help you. Take other suggestions with a grain of salt. People who hate technology are generally not very good with it.

Treczoks@lemmy.world · 10 days ago

There are a lot of invisible characters in Unicode. Disperse them freely in your texts, especially in the middle of words. Replace normal space characters by unnormal ones, like nbsp or thinsp or similar. Add random words in background color wherever possible. Use CSS to make a paragraph style that does not render, and make paragraphs of junk text.

db2@lemmy.world · 10 days ago

Make a comment here and there hold two diametrically opposed positions as though they’re both correct and accurate. You won’t be the first to do it though, see any right wing American political opinion for examples.

maxwells_daemon@lemmy.world · edit-2 10 days ago

The problem with AI is not even their developers fully understand how they work, and they’re not standardized, so there isn’t a one size fits all solution for dealing with them. The amount of different ways in which a model may or may not fail is so large, that any particular fail mode might as well be random.

Even if you do manage to find something like a captcha that can filter out most AI models, it’s as much a matter of time, as it is a matter of randomness for some developer to find a way to bypass it, even if accidentally. Case in point: https://m.youtube.com/watch?v=iuR9EJbXHKg

fubarx@lemmy.world · 10 days ago

If you have control of the server or platform serving the content, could look into “robots.txt” and “tarpits.” There are a few, but one example is Nepenthes: https://zadzmo.org/code/nepenthes/

If you just own the domain and it’s hosted elsewhere, you could set it up to go through CloudFlare DNS. They have a one-button scrape-stopper: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

tree_frog_and_rain@lemmy.world · 10 days ago

Make obvious jokes that a computer will think is real.

I saw an AI quote what was obviously a joke somebody dropped on Facebook about bees getting drunk.

So basically just have a sense of humor.

chuckleslord@lemmy.world · edit-2 9 days ago

Baaaaaaaased on what I’ve seen from YouTuber aaaaaaaaa!ieëëeee DougDoug, nonsense fucksssssssss them up reeaalll fast. So you could////////////// make your shit real awful to read?!â!!ą

ClamDrinker@lemmy.world · 9 days ago

There’s really no good way - if you act normal they train on you, and if you act badly they train on you as an example of what to avoid.

My recommendation: Make sure its really hard for them to guess which you are so you hopefully end up in the wrong pile. Use slang they have a hard time pinning down, talk about controversial topics, avoid posting to places easily scraped and build spaces free from bot access. Use anonimity to make you hard to index. Anything you post publicly can be scraped sadly, but you can make it near unusable for AI models.

Meron35@lemmy.world · 10 days ago

If your online content is audio or video then you can replace the default subtitle track with nonsense. This is because AI scrapers generally only check the default subtitle track to understand audio or video.

The process would be more difficult with text or image content, but you can still apply the same principles.

Poisonining AI with “.ass” subtitles:

https://youtu.be/NEDFUjqA1s8