Minghao Wu, Alham Fikri Aji: Style Over Substance: Evaluation Biases for Large Language Models. COLING 2025: 297-312