PixelTopo-Gen: Teaching Pure-Text LLMs to Understand Space by Generating 0/1 Pixel Art
Recent work (Vision Banana) has demonstrated that visual generation models can "understand" by generating RGB images—proving that their ability to create accurate visual content reflects genuine visual comprehension. This raises a dual question for pure-text large language models (LLMs): Can they demonstrate spatial understanding by generating 0/1 pixel art, the simplest possible representation of 2D space?
NLPLoRALLM Interpretability
Under Research.