Solomonic learning: Large language models and the art of induction

5 points by 100ideas 8 months ago

100ideas 8 months ago

I found the opening quote of this article to be intriguing, especially since it was from a 1992 research lab:

“One year of research in neural networks is sufficient to believe in God.” The writing on the wall of John Hopfield’s lab at Caltech made no sense to me in 1992. Three decades later, and after years of building large language models, I see its sense if one replaces sufficiency with necessity: understanding neural networks as we teach them today requires believing in an immanent entity.

100ideas 8 months ago

Basically, as LLMs scale up, the author (Soatto, VP at AWS) suggests they're beginning to resemble Solomonoff inference: hypothetically optimal but computationally infinite approach that executes all possible programs to match observed data. Repeating this approach for any given question by definition gives the best answer, yet requires no learning, since the entire process can be repeated for any query (thanks to infinite computation).
The article develops a theoretical framework contrasting traditional inductive learning (which emphasizes generalization over memorization) with transductive inference (which embraces memorization and reasoning). Here's a quote:
"What matters is that LLMs are inductively trained transductive-inference engines and can therefore support both forms of inference.[2] They are capable of performing inference by inductive learning, like any trained classifier, akin to Daniel Kahneman’s “system 1” behavior — the fast thinking of his book title Thinking Fast and Slow. But LLMs are also capable of rudimentary forms of transduction, such as in-context-learning and chain of thought, which we may call system 2 — slow-thinking — behavior. The more sophisticated among us have even taught LLMs to do deduction — the ultimate test for their emergent abilities."
Sadly, the opening quote is not elucidated.

gnabgib 8 months ago

Blog title: Solomonic learning: Large language models and the art of induction

100ideas 8 months ago

Yes, I should have made that clear in my first comment. Thanks for doing so. I used the quote in my title because I found it a fascinating way to start a technical blog post, and it made me want to read the article to understand what the author was planning to write from such a beginning.
- gnabgib 8 months ago
  
  Per the guidelines: please use the original title https://news.ycombinator.com/newsguidelines.html
  - 100ideas 8 months ago
    
    Oops, thanks. I changed it.

gtsop 8 months ago

Dark times for science when such quotes are thrown as legitimate.

The article is extremely technical and doesn't really explain the quote other than acknowledging that there are stuff we don't understand yet.

And really, a person will never grasp machine learning and AI as long as they keep drawing unbased parallels to humans and machines.

talldayo 8 months ago

The article is bullshit shrouded in turboencabulator-speak. He's trying to roll the same Sisyphian boulder that crushed the computational linguists, thinking that AI makes it different this time. Conveniently he also waived any attempts at providing proof for his claims.
>> If the training data subtend latent logical structures, as do sensory data such as visual or acoustic data, models trained as optimal predictors are forced to capture their statistical structure.
There are a lot of red flags to pick from in this article, but this one stood out to me as the most absurd. AI doesn't get magical multimodal powers from reading secondhand accounts describing a sensation. You can say it in as fancy of a phrasing as you want, but the proof is in the pudding. The "statistical structure" of that text doesn't propagate a meaningful understanding of almost anything in the real world.
> And really, a person will never grasp machine learning and AI as long as they keep drawing unbased parallels to humans and machines.
I think you're right on the money with this one.
- 100ideas 8 months ago
  
  I think you both make valid points, but I also get the sense that the article is articulating insights gained from pure math explorations into the theoretical limitations of learning, which in the article can sound "turboencabulator-speak" when compressed into words.
  Maybe I should have just linked to the research paper:
  [B'MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory](https://www.arxiv.org/abs/2407.06324)