@paulrobertlloyd I see it a little differently. I'd like to see more effort by AI companies to credit sources, so that the balance between crawling and publishing is closer to what we have with search engines. AI companies are not going to go away, but we need to push them in the right directions.

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> manton

I get the distrust of AI bots but I think discussions to sabotage crawled data go too far, potentially making a mess of the open web. There has never been a system like AI before, and old assumptions about what is fair use don’t really fit. But robots.txt still works! No need to burn everything down yet.

2024-06-18 2:01 am

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> johninfante@mastodon.world

@manton AI companies are already finding ways around it potentially: https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/

2024-06-18 2:11 am

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> moonmehta

@manton Didn’t OpenAI make its market entry by not caring about the robots.txt file and only made the agent known after backlash?

2024-06-18 6:53 am

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> paulrobertlloyd

@manton And yet in many cases robots.txt, and other valid attempts to block AI bots are being ignored. AI companies are not playing fair, and are a clear and present danger to the open web.

2024-06-18 8:54 am

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> mortenbekditlevsen@mastodon.social

@manton ‘Burning things down’ could also be rephrased: ‘testing out the prompt injection weaknesses of LLMs in the open before someone actively exploits them’, could it not?

2024-06-18 1:05 pm

|

Embed

In reply to

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> manton

@paulrobertlloyd I see it a little differently. I'd like to see more effort by AI companies to credit sources, so that the balance between crawling and publishing is closer to what we have with search engines. AI companies are not going to go away, but we need to push them in the right directions.

2024-06-18 1:37 pm

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> manton

@moonmehta Yes, that's true.

2024-06-18 1:39 pm

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> manton

@mortenbekditlevsen Ha, I guess so!

2024-06-18 1:40 pm

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> adactio

@manton How on Earth does sabotaging a piece-of-shit abusive scraper (that's already ignoring robots.txt) in any way lead to “potentially making a mess of the open web”? These bad actors are the opposite of the open web. I’m advocating for us to protect the open web.

2024-06-28 7:46 am

|

Embed

profilePopup#show mouseout->profilePopup#hide">

profilePopup#show mouseout->profilePopup#hide"> manton

@adactio I think it's a bad precedent. It's already hard enough for legitimate crawling because of tricks that paywalls use, or JavaScript that gets in the way. Mucking up text and images is bound to create problems for non-AI tools too. There's gotta be a better way to address this.

2024-06-28 12:32 pm

|

Embed

Micro.blog

Micro.blog