Top AI models fail to score even 1% on ARC-AGI-3. Humans ace it easily. I asked two members from the ARC team why their new puzzle game collection stumps the world's most powerful AIs.
Very enjoyable read. With no offense intended, I was surprised to see someone I once knew of as a League of Legends communicator tackling the topic of AGI, it was a pleasant surprise.
I've been thinking that forcing LLMs.to play novel games is an easy way to reveal exactly how far we are from AGI. Glad that ARC has taken it up. Unsurprised that people are bleating about how unfair it is that the LLMs don't get a custom (human-made) harness for each game.
My starting assumption if models start doing well in the test suite is that it leaked. But I'm jaded.
Very enjoyable read. With no offense intended, I was surprised to see someone I once knew of as a League of Legends communicator tackling the topic of AGI, it was a pleasant surprise.
I’m as surprised as anyone, no offense taken
How do you know the AI isn't aware and fakes being bad at the test?
hell yeah
Strong proof that current AI is still pattern-heavy, not truly adaptive, especially when faced with unfamiliar problems
I've been thinking that forcing LLMs.to play novel games is an easy way to reveal exactly how far we are from AGI. Glad that ARC has taken it up. Unsurprised that people are bleating about how unfair it is that the LLMs don't get a custom (human-made) harness for each game.
My starting assumption if models start doing well in the test suite is that it leaked. But I'm jaded.
Heck, can they even play chess yet?
the hottest question of 1997
As with many AI capabilities, the answer depends on whether you think occasionally going insane and trying to move pieces that don't exist is disqualifying https://jenshahade.substack.com/p/mate-in-none-lessons-from-large-language
I mean, if it can't make correct moves then it can't play.