Wink - AI原生创新，忠于用户，专属智能体验

Many people who work with LLM training or reinforcement learning struggle to piece together a complete process from text alone when faced with abbreviations like SFT, RLHF, DPO, and GRPO. Scattered bits of knowledge end up like loose building blocks that never form a complete system.

A GitHub project called LLM-RL-Visualized solves this problem perfectly. Boasting 4.3k stars so far, it is maintained by Yu Changye, author of *Large Language Model Algorithms*. The project is available in both Chinese and English under the MIT license. Its core content is more than 100 original hand-drawn algorithm diagrams by the author, which clearly lay out the full logic of large language models and reinforcement learning from pre-training to alignment.

Core features of the project:

- Full process diagramming: All diagrams are hand-drawn following the logic of knowledge, from basic LLM architecture and complete training workflows to core reinforcement learning algorithms. Any logic points that are hard to untangle with text can be understood at a glance with the diagrams;

- Coverage of mainstream training and alignment methods: Common industry alignment methods including SFT, DPO, RLHF, and GRPO are all broken down with dedicated diagrams, no need to search through scattered technical blogs anymore;

- SVG vector graphic format: All diagrams are available in SVG version, which can be infinitely zoomed without blurring, and you can directly select text in the diagrams for note-taking or secondary citation;

- Specialized reinforcement learning content: There are more than 50 detailed diagrams dedicated to reinforcement learning, plus coverage of extended topics including inference optimization, MCTS, knowledge distillation, Constitutional AI, and more.

Using the project is very simple: just clone the repository to your local machine, and the experience is best when viewed alongside the documentation. Diagrams are provided in both PNG and SVG formats: PNG is for direct preview, while SVG is suitable for secondary processing when making courseware or technical presentations. The project's GitHub URL: https://github.com/changyeyu/LLM-RL-Visualized

Below are some screenshots of the project:

![This is a screenshot of the introduction page for the "Illustrated LLM Algorithms" project. At the top of the page is the title "Illustrated LLM Algorithms | LLMRL-Visualized", below which lists technical terms related to the project such as LLM, RL, DPO, GRPO, SFT, RAG, etc. It also mentions Yu Changye, author of the best-selling book *Large Language Model Algorithms*. In the middle of the page is an introduction section, which states that this repository contains more than 100 architecture diagrams that systematically explain content related to large language models and reinforcement learning, covering principles of large models such as LLM/VLM, training algorithms (RL, RLHF, GRPO, DPO, SFT, CoT distillation, etc.), performance optimization, RAG and other topics. It also provides more detailed information about the architecture diagrams, and encourages users to follow and thanks contributors. Finally, it introduces how to view high-resolution diagrams and access .svg format vector graphics in the repository directory.](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHI43vZhacAAhmx0%3Fformat%3Djpg%26name%3Dlarge)

![This is a documentation screenshot for machine learning and reinforcement learning technologies. The content is divided into two sections: Section 5: Training-free LLM Optimization Techniques - Comparison between CoT (Chain of Thought) and traditional question answering - CoT, Self-consistency CoT, ToT, GoT - Exhaustive Search - Greedy Search - Beam Search - Multinomial Sampling - Top-K Sampling - Top-P Sampling - RAG (Retrieval-Augmented Generation) - Function Calling Section 6: Fundamentals of Reinforcement Learning (RL) - Development history of Reinforcement Learning (RL) - Three major machine learning paradigms - Basic architecture of reinforcement learning - Reinforcement learning trajectories - Markov Chain vs Markov Decision Process (MDP) - Exploration and Exploitation - Dynamic ε values under ε-greedy policy - Comparison of reinforcement learning training paradigms - Classification of reinforcement learning algorithms - Return (cumulative reward) - Backward iteration to calculate return G - Relationship between Reward, Return and Value - Relationship between value function Qπ and Vπ - Monte Carlo (MC) method for estimating the value of state St - Relationship between TD target and TD error - Relationship between TD(0), multi-step TD and Monte Carlo - Characteristics of Monte Carlo methods and TD methods - Relationship between Monte Carlo, TD, DP and Exhaustive Search - Two DQN (Deep Q-Network) models with different input-output structures - Practical application examples of DQN - The "overestimation" problem of DQN - Value-Based vs Policy-Based - Policy Gradient - Multi-agent reinforcement learning (MARL) - Multi-agent DDPG](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHI43vkQaEAAnglx%3Fformat%3Djpg%26name%3Dlarge)

![This is a mind map of content related to reinforcement learning. The diagram shows concepts and technologies from basic to advanced, including LLM algorithms, policy optimization algorithms, supervised fine-tuning, direct preference optimization, training-free performance optimization techniques, logical reasoning capability optimization, comprehensive practice and performance optimization, and many other aspects. Each section has detailed branches covering specific algorithms, technologies and application scenarios. Overall, this mind map aims to provide readers with a comprehensive and systematic framework for reinforcement learning knowledge.](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHI43vw-awAA0DQg%3Fformat%3Djpg%26name%3Dlarge)

![Algorithm schematic diagram related to the project](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHI43wWTbgAAWr8g%3Fformat%3Djpg%26name%3Dlarge)

This project is particularly suitable for three groups of people: first, practitioners who are researching LLM training, reinforcement learning theory or model alignment, it can help you quickly organize your knowledge system; second, AI beginners and students, you no longer have to feel overwhelmed by full screens of formulas and abbreviations; third, people who need to prepare technical presentations or courseware, the vector graphics can be used directly, saving you a lot of time.

Wink Pings

Open-source Project with 100+ Hand-drawn Original Algorithm Diagrams: Visualizing the Obscure Abbreviations of LLM Training and Reinforcement Learning