The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Davriellelouna@lemmy.world · edit-2 21 days ago

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

lividweasel@lemmy.world · 21 days ago

…and Perplexity’s scraping is unnecessarily traffic intensive since they don’t cache the scraped data.

That seems almost maliciously stupid. We need to train a new model. Hey, where’d the data go? Oh well, let’s just go scrape it all again. Wait, did we already scrape this site? No idea, let’s scrape it again just to be sure.

snooggums@lemmy.world · 21 days ago

They do it this way in case the data changed, similar to how a person would be viewing the current site. The training was for the basic understanding, the real time scraping is to account for changes.

It is also horribly inefficient and works like a small scale DDOS attack.

jballs@lemmy.world · 21 days ago

It’s worth giving the article a read. It seems that they’re not using the data for training, but for real-time results.

rdri@lemmy.world · 20 days ago

First we complain that AI steals and trains on our data. Then we complain when it doesn’t train. Cool.

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants