Fig. 4: Multi-agent answer generation and aggregation in the MoA reasoning path.

This figure illustrates how two layers of heterogeneous small language models (SLMs) independently generate candidate responses, which are then aggregated by a LLaMA-3.2-11B-Vision-Instruct-Turbo model to produce the final structured materials data triples.