But what exactly does this product do that you can’t from just parsing the stream?
Besides, the problem with hallucinations is the unknown unknowns: if what you’re doing is easily verifiable (like parsing JSON or checking valid chess moves) it’s trivial. But what if you don’t know the answer yourself? Then it’s basically impossible to solve.
Yes, you are absolutely right, my works only demonstrate how to catch the hallucination in realtime in case we know what kind of hallucination with clear definition.
To detect those unknown hallucication is a very hard problem. If you dont know the target, how do you shoot it.
I guess the real question is how often do you see the same class of hallucination ? For something where you're using an LLM agent/Workflow, and you're running it repeatedly, I could totally see this being worthwhile.
Yeah, reading the headline got me excited too.
I thought they are going to propose some novel solution or use the recent research by OpenAI on reward function optimization.
It's rather cheeky to call it "real-time AI hallucination detection" when all they're doing is checking for invalid moves and playing twice. You don't even need real-time processing for this, do you?
I didn’t understand quite the point of the claims from end of the page. Surely automatic cars or health/banking services don’t use language models for anything important. Everyone knows those hallucinate. ML is lot better alternative.
But what exactly does this product do that you can’t from just parsing the stream?
Besides, the problem with hallucinations is the unknown unknowns: if what you’re doing is easily verifiable (like parsing JSON or checking valid chess moves) it’s trivial. But what if you don’t know the answer yourself? Then it’s basically impossible to solve.
Yes, you are absolutely right, my works only demonstrate how to catch the hallucination in realtime in case we know what kind of hallucination with clear definition.
To detect those unknown hallucication is a very hard problem. If you dont know the target, how do you shoot it.
So you have to be able to identify a priori what is and isn't an hallucination right?
The oracle problem is solved. Just use an actual oracle.
I guess the real question is how often do you see the same class of hallucination ? For something where you're using an LLM agent/Workflow, and you're running it repeatedly, I could totally see this being worthwhile.
Yeah, reading the headline got me excited too. I thought they are going to propose some novel solution or use the recent research by OpenAI on reward function optimization.
It's rather cheeky to call it "real-time AI hallucination detection" when all they're doing is checking for invalid moves and playing twice. You don't even need real-time processing for this, do you?
the chess is just a simple example, realtime processing is critical is many AI monitoring use cases
There’s a more generalizable work on this recently for those expecting more. https://github.com/leochlon/hallbayes
AI can be hallucination but real-time detection is key
I didn’t understand quite the point of the claims from end of the page. Surely automatic cars or health/banking services don’t use language models for anything important. Everyone knows those hallucinate. ML is lot better alternative.
is this satire?
[dead]