Wink - AI原生创新，忠于用户，专属智能体验

Engineer Pau Labarta Bajo recently shared this practical tip for AI practitioners: you don't have to deploy Whisper on a server just to get speech transcription done.

Liquid AI's LFM2-Audio-1.5B model, paired with the llama.cpp inference framework, enables real-time speech transcription on a regular laptop. The entire process works offline, with all audio and transcription results stored only locally and never uploaded to any external servers. The full workflow architecture is shown below:

![Local Transcription Architecture Diagram](https://raw.githubusercontent.com/Liquid4All/cookbook/main/examples/audio-transcription-cli/media/diagram.gif)

### Quick Start Guide

No manual dependency compilation is required for this setup; the official team has already prepared an automated script. Just follow these four steps:

1. Clone the official example repository

```

git clone https://github.com/Liquid4All/cookbook.git

cd cookbook/examples/audio-transcription-cli

```

2. Install the uv package manager (skip this step if you already have it on your system)

**macOS/Linux:**

```

curl -LsSf https://astral.sh/uv/install.sh | sh

```

**Windows:**

```

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

```

3. Download test audio samples

```

uv run download_audio_samples.py

```

4. Run the transcription command to see real-time results in your console. Add the `--play-audio` flag to play the audio synchronously

```

uv run transcribe --audio './audio-samples/barackobamafederalplaza.mp3' --play-audio

```

### Supported Platforms

The pre-compiled official packages currently support four platforms:

- android-arm64

- macos-arm64

- ubuntu-arm64

- ubuntu-x64

Users on other platforms will need to wait for official compatibility updates.

### Extended Use Cases

This solution uses llama.cpp under the hood, an open-source lightweight inference framework written in C++. It delivers much higher runtime efficiency than common libraries like PyTorch and transformers, making it ideal for edge deployment. The CLI will automatically download the platform-adapted version of llama.cpp, so users don't need to handle manual compilation.

Beyond basic automatic speech recognition (ASR), LFM2-Audio-1.5B also supports text-to-speech (TTS), and even allows custom voice styling. The official team has provided command-line examples for three core use cases:

1. Speech Transcription

```

# Audio to Speech Recognition (ASR)

./llama-lfm2-audio \

-m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \

--mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \

-mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \

-sys "Perform ASR." \

--audio $INPUT_WAV

```

2. Basic Text-to-Speech

```

# Text To Speech (TTS)

./llama-lfm2-audio \

-m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \

--mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \

-mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \

-sys "Perform TTS." \

-p "My name is Pau Labarta Bajo and I love AI" \

--output $OUTPUT_WAV

```

3. Custom-Style Text-to-Speech

```

./llama-lfm2-audio \

-m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \

--mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \

-mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \

-sys "Perform TTS.

Use the following voice: A male speaker delivers a very expressive and animated speech, with a low-pitch voice and a slightly close-sounding tone. The recording carries a slight background noise." \

-p "What is your name man?" \

--output $OUTPUT_WAV

```

### Quality Optimization

Directly output transcription text may sometimes have incorrect sentence breaks and ungrammatical phrasing due to overlapping audio chunking. The official team provides a local optimization solution: pair it with the smaller same-series LFM2-350M text model for post-processing cleanup. This two-step fully local workflow greatly improves transcription quality while remaining completely offline.

Full official documentation and the latest updates are available here: [Official Real-Time Audio Transcription Documentation](https://docs.liquid.ai/examples/laptop-examples/audio-to-text-in-real-time)

Wink Pings

Real-Time Speech Transcription That Runs Locally On Your Laptop — No Whisper Server Deployment Required