Wink - AI原生创新，忠于用户，专属智能体验

SupraLabs刚刚发布了全新小模型Supra-50M，这是团队「扩容计划」的第一个成果，同步放出了Base基础版和Instruct指令微调版，全部开源在HuggingFace。

![SupraLabs品牌logo](https://wink.run/image?url=https%3A%2F%2Fi.redd.it%2Fkx39ammxno2h1.jpg%3Fwidth%3D1080%26format%3Dpjpg%26auto%3Dwebp%26s%3Dd1a2d5b27920a5b61a50547a6e70a6378445cae4)

Supra-50M采用和Llama一致的纯解码器Transformer架构，完全从零开始训练，训练数据是200亿token的高质量教育类网页文本。仅50M参数的体量，在多个公开基准测试中，表现超过了参数量大得多的同类模型。

我们把官方测试数据整理成了表格，对比非常直观：

| -------- | --------- | ----------------------- | ------------------------- | -------------------------- |

| 语言学BLiMP | **76.3%** | 63.0% | 69.8% | N/A |

| 科学知识SciQ | 77.2% | 53.2% | 73.4% | **84.70%** |

| 通用知识ARC-Easy | **52.2%** | 42.0% | 49.2% | 45.08% |

| 逻辑推理PIQA | 62.2% | 63.0% | 67.3% | **69.75%** |

| 上下文理解HellaSwag | 31.8% | 29.5% | 42.0% | **46.71%** |

可以看到，除了本身参数规模带来的上限，在知识和语言学能力上，这个只有50M参数的小模型已经赢过了两倍多参数量的老模型。

核心架构和超参数如下：

| 超参数 | 数值 |

| ------ | ---- |

| 架构 | Llama 纯解码器Transformer |

| 总参数量 | ~50M |

| 词表大小 | 32000 |

| 隐藏层维度 | 512 |

| 中间层维度 | 1408 |

| 隐藏层数量 | 12 |

| 注意力头数量 | 8 |

| KV注意力头数量 | 4(GQA分组查询注意力) |

| 最大位置编码 | 1024 |

| RoPE theta值 | 10000 |

| 绑定词嵌入 | 是 |

训练细节也公开得很完整：

- 训练数据：HuggingFaceFW/fineweb-edu的sample-100BT数据集，总token量200亿

- 分词器：从零训练的字节级BPE，词表32000，包含5个特殊token

- 训练配置：单GPU完成训练，1个epoch，最终损失3.259，使用bfloat16精度，开启torch.compile优化

团队自己测试，这个模型即使在Ryzen 3 3200G这种老旧低压处理器上也能跑得极快，不需要高端显卡也能本地运行。目前原始模型已经上传HuggingFace：[Supra-50M-Base](https://huggingface.co/SupraLabs/Supra-50M-Base) | [Supra-50M-Instruct](https://huggingface.co/SupraLabs/Supra-50M-Instruct)

针对社区关心的几个问题，作者也在跟帖里做了回答：

关于适用场景，作者称Supra-50M的定位就是轻量快速推理、小模型缩放研究实验，以及低资源低延迟的部署场景。它可以做简单的指令跟随、轻量分类和命名实体识别，能用在简单的文本预处理打标环节，但不适合复杂推理的生产级任务，也没法和大模型比事实准确性。

关于GGUF量化格式，因为当前版本用了自定义分词器，适配GGUF比较麻烦，后续发布的124M、350M版本会改用通用分词器，兼容GGUF格式，方便更多用户在低配硬件上本地运行。

关于训练优化，这次用了AdamW，有用户推荐试试DeepSeek在用的Muon优化器，作者表示会后续研究测试。

SupraLabs接下来还会推出两个更大的版本：124M参数会包含基础版、对话版和实验性推理版本，350M参数会覆盖基础、对话、推理和代码能力。目前这个50M版本已经被收录进HuggingFace「360M参数以下基础文本生成模型」合集，感兴趣可以自己拉下来测试。

以下是官方提供的推理代码，可以直接运行测试：

**Instruct版本推理代码**

```python

import os, warnings

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

warnings.filterwarnings("ignore", category=UserWarning, module="transformers")

import torch

from transformers import pipeline, AutoTokenizer, logging

logging.set_verbosity_error()

MODEL_ID = "SupraLabs/Supra-50M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False)

pipe = pipeline(

"text-generation",

model=MODEL_ID,

tokenizer=tokenizer,

device_map="auto",

torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32

)

def build_prompt(instruction, input_text=""):

if input_text.strip():

return (

"Below is an instruction that describes a task, paired with an input "

"that provides further context. Write a response that appropriately "

"completes the request.\n\n"

f"### Instruction:\n{instruction}\n\n"

f"### Input:\n{input_text}\n\n### Response:\n"

)

return (

"Below is an instruction that describes a task. Write a response that "

"appropriately completes the request.\n\n"

f"### Instruction:\n{instruction}\n\n### Response:\n"

)

def generate(instruction, input_text=""):

result = pipe(

build_prompt(instruction, input_text),

max_new_tokens=512, do_sample=True, temperature=0.7,

top_k=50, top_p=0.9, repetition_penalty=1.15,

pad_token_id=pipe.tokenizer.pad_token_id,

eos_token_id=pipe.tokenizer.eos_token_id,

return_full_text=False

)

return result[0]['generated_text'].strip()

while True:

print("\nEnter an instruction (or 'exit' to quit):")

user_input = input().strip()

if user_input.lower() == "exit":

break

print("\nEnter additional context (optional, press Enter to skip):")

context_input = input().strip()

print(f"\nResponse:\n{generate(user_input, context_input)}\n")

```

**Base版本推理代码**

```python

from transformers import pipeline

import torch

pipe = pipeline(

"text-generation",

model="SupraLabs/Supra-50M_BASE",

device_map="auto",

torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32

)

def generate_text(prompt, max_new_tokens=150):

result = pipe(

prompt,

max_new_tokens=max_new_tokens,

do_sample=True, temperature=0.5,

top_k=25, top_p=0.9, repetition_penalty=1.2,

pad_token_id=pipe.tokenizer.pad_token_id,

eos_token_id=pipe.tokenizer.eos_token_id

)

return result[0]['generated_text']

prompt = "The importance of education is"

print(f"Prompt: {prompt}\n" + "-" * 40)

print("\nOutput:\n" + generate_text(prompt))

```

Wink Pings

Supra-50M：只有50M参数的开源小模型，性能超过124M参数的GPT-2