LLMs perf on Path-X or Path-256?
the flash attention paper reports achieving the first non-random performance on the Path-X and Path-256 challenges. has any modern LLM been evaluated on these tasks ? If not, what is the reason ?
the flash attention paper reports achieving the first non-random performance on the Path-X and Path-256 challenges. has any modern LLM been evaluated on these tasks ? If not, what is the reason ?