ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Lifecoach5000@lemmy.world · 10 days ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

NeilBrü@lemmy.world · edit-2 9 days ago

An LLM is a poor computational/predictive paradigm for playing chess.

surph_ninja@lemmy.world · 9 days ago

This just in: a hammer makes a poor screwdriver.

Takapapatapaka@lemmy.world · 9 days ago

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).

NeilBrü@lemmy.world · 9 days ago

I’m impressed, if that’s true! In general, an LLM’s training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.

Takapapatapaka@lemmy.world · 9 days ago

Oh yes, cost of training are ofc a great loss here, it’s not optimized at all, and it’s stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you’re interested.

NeilBrü@lemmy.world · edit-2 9 days ago

Absolutely interested. Thank you for your time to share that.

My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don’t really mess about with fine-tuning LLMs.

However, I do self-host my own LLMs as code assistants. Thus, I’m only tangentially involved with the current LLM craze.

But it does interest me, nonetheless!

Takapapatapaka@lemmy.world · 8 days ago

Here is the main blog post that i remembered : it has a follow up, a more scientific version, and uses two other articles as a basis, so you might want to dig around what they mention in the introduction.

It is indeed a quite technical discovery, and it still lacks complete and wider analysis, but it is very interesting for the fact that it kinda invalidates the common gut feeling that llms are pure lucky random.

Bleys@lemmy.world · 9 days ago

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.

NeilBrü@lemmy.world · edit-2 9 days ago

Yes, I agree wholeheartedly with your clarification.

My career path, as I stated in a different comment in regards to neural networks, is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.

Thus, large language models are well out of my area of expertise in terms of the architecture of their models.

However, fundamentally it boils down to the fact that the specific large language model used was designed to predict text and not necessarily solve problems/play games to “win”/“survive”.

(I admit that I’m just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to practice explaining to laymen and, dare I say, clients. It helps me feel as if I don’t come off too pompously when talking about this subject to others; forgive my tedium.)

OBJECTION!@lemmy.ml · 10 days ago

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

andallthat@lemmy.world · edit-2 9 days ago

Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as “AI” and attributing every ML win ever to “AI”.

ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that “AI helps cure cancer”, it makes it sound like it was a lone researcher who spent a few minutes engineering the right prompt for Copilot.

Yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it “AI” and bundling it together with the latest Gemini or Claude iteration’s “reasoning capabilities” is intentionally misleading. That’s why articles like this one are needed. ML is a useful tool but far from the “super-human general intelligence” that is meant to replace half of human workers by the power of wishful prompting

cley_faye@lemmy.world · 9 days ago

Ah, you used logic. That’s the issue. They don’t do that.

anubis119@lemmy.world · 10 days ago

A strange game. How about a nice game of Global Thermonuclear War?

Lifecoach5000@lemmy.world · 10 days ago

Lmao! 🤣 that made me spit!!

Xanthobilly@lemmy.world · 10 days ago

Furbag@lemmy.world · 10 days ago

Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.

bountygiver [any]@lemmy.ml · 10 days ago

and still lose to stockfish even after conjuring 3 queens out of thin air lol

jsomae@lemmy.ml · 9 days ago

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it’s obviously not going to be good at it, at least not without scaffolding.

arc99@lemmy.world · 10 days ago

Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.

Nurse_Robot@lemmy.world · 10 days ago

I’m often impressed at how good chatGPT is at generating text, but I’ll admit it’s hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate

Blaster M@lemmy.world · 10 days ago

ChatGPT is playing Anarchy Chess

Lifecoach5000@lemmy.world · 10 days ago

Yeah! I’ve loved watching Gothem Chess’ videos on these. Always have been good for a laugh.

oni ᓚᘏᗢ@lemmy.world · 10 days ago

This made my day

hogmomma@lemmy.world · 10 days ago

Get your booty on the floor tonight.

finitebanjo@lemmy.world · 9 days ago

All these comments asking “why don’t they just have chatgpt go and look up the correct answer”.

That’s not how it works, you buffoons, it trains off of datasets long before it releases. It doesn’t think. It doesn’t learn after release, it won’t remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don’t understand that it generates statistical representations of an answer based on answers given in the past.

seven_phone@lemmy.world · 10 days ago

You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

wizardbeard@lemmy.dbzer0.com · 10 days ago

No, more like “Your marketing team, sales team, the news media at large, and random hype men all insist your orange machine works amazing on any fruit if you know how to use it right. It didn’t work my strawberries when I gave it all the help I could, and was outperformed by my 40 year old strawberry machine. Please stop selling the idea it works on all fruit.”

This study is specifically a counter to the constant hype that these LLMs will revolutionize absolutely everything, and the constant word choices used in discussion of LLMs that imply they have reasoning capabilities.

Harbinger01173430@lemmy.world · 9 days ago

Llms useless confirmed once again

krigo666@lemmy.world · edit-2 10 days ago

Next, pit ChatGPT against 1K ZX Chess in a ZX81.

NotMyOldRedditName@lemmy.world · 10 days ago

Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?

GreenKnight23@lemmy.world · 10 days ago

no.

the answer is always, no.

NotMyOldRedditName@lemmy.world · 10 days ago

The answer might be no today, but always seems like a stretch.

vane@lemmy.world · 10 days ago

It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.