The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.
don’t you dare understand the explicitly obvious reasons this technology can be useful and the essential differences between P and NP problems. why won’t you be angry >:(
Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate. LLMs don’t get tired and they can be run in parallel.
The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.
Very fair comment. In my experience even increasing the temperature you get stuck in local minimums
I was just trying to illustrate how 70% failure rates can still be useful.
What’s 0.7^10?
About 0.02
So the chances of it being right ten times in a row are 2%.
No the chances of being wrong 10x in a row are 2%. So the chances of being right at least once are 98%.
Ah, my bad, you’re right, for being consistently correct, I should have done 0.3^10=0.0000059049
so the chances of it being right ten times in a row are less than one thousandth of a percent.
No wonder I couldn’t get it to summarise my list of data right and it was always lying by the 7th row.
That looks better. Even with a fair coin, 10 heads in a row is almost impossible.
And if you are feeding the output back into a new instance of a model then the quality is highly likely to degrade.
Whereas if you ask a human to do the same thing ten times, the probability that they get all ten right is astronomically higher than 0.0000059049.
Dunno. Asking 10 humans at random to do a task and probably one will do it better than AI. Just not as fast.
don’t you dare understand the explicitly obvious reasons this technology can be useful and the essential differences between P and NP problems. why won’t you be angry >:(