🚀 RL | Tags | Yuki’s Blog

Yuki’s Blog

Bio

Home

Blog

History

Cateogry

How to train Deepseek R1?

2025-1-31

What is the python code to reproduce Deepseek R1?

Deepseek R1: Main Takeaways and Insights

2025-1-20

The key points of Deepseek R1 Research Paper

PPO Explained for Dummies (With Python)

2022-3-9

Proximal Policy Optimization (PPO) is one of the most powerful reinforcement learning algorithms, balancing stability and efficiency. This article breaks down how AI gradually improves in decision-making using trial, error, and strategic policy updates—just like learning to ride a bike!

REINFORCE Explained for Dummies (With Python)

2022-3-8

The REINFORCE algorithm is the most basic policy gradient reinforcement learning algorithm. Imagine you’re learning to ride a bicycle without a teacher to guide you on what to do. You can only learn through "try → see the result → adjust → try again." The REINFORCE algorithm is the mathematical expression of this learning process.

MCTS Explained for Dummies (With Python)

2022-3-10

Imagine you're playing a game of chess, and there are many choices at each step. Monte Carlo Tree Search is like a smart assistant that helps you find the best move by "simulating the future.”

Advantage Actor-Critic (A2C) Explained for Dummies (With Python)

2022-3-12

A2C (Advantage Actor-Critic) is essentially an upgrade of REINFORCE.

Reinforcement Learning Workshop Collection

2025-5-21