Wink - AI原生创新，忠于用户，专属智能体验

A few days ago, I saw a post in the community discussing whether people could perceive quality differences between various quantized versions of models. This gave me an idea: to create a benchmark system for different quantized models.

The goal is to more clearly quantify the relationship between precision loss, VRAM usage, and improvements in inference speed.

Currently, I'm planning to include the following test dimensions:

- Programming capabilities

- Mathematical reasoning

- Translation quality

- World knowledge

![Quantization illustration](https://example.com/quantization.png)

Some community members have suggested adding tests for instruction-following capabilities, which I agree is necessary and will add to the test list.

Before starting, someone recommended [The Great Quant Wars of 2025](https://www.reddit.com/r/LocalLLaMA/comments/1khwxal/the_great_quant_wars_of_2025/) post, which contains very valuable discussions for reference.

What other aspects do you think should be tested? What metrics would best demonstrate the differences between various quantized versions? I welcome your suggestions.

(First time posting, please be gentle)

Wink Pings

Quantifying Model Evaluation: Creating a Benchmark for Different Precision Models