4 Papers accepted to EMNLP, ACM MM and ICML 2025 Workshop! Others are coming soon!

AI News

Summary of My Papers

This article introduces the full process of building an mRAG (Multimodal Retrieval-Augmented Generation) application and provides detailed explanations of its key principles.

Multimodal RAG - Paper Q&A System Based on Qwen2VL + Evaluation


 Google releases their new Gemma 3n models!✨Gemma 3n supports audio, vision, video & text and needs just 2GB RAM for fast local inference. 

Gemma 3n – Multimodal for Edge AI (Released Today)

IntentVC Challenge at ACM MM 2025 - Second place winner

IntentVC Challenge at ACM MM 2025

Explain Agentic RAG and the core differences between RAG and the Python demo.

RAG vs Agentic RAG (Trending 2025Q2)

 LangGraph, Langchain, Autogen, AutoGPT and more….

AI Agent Frameworks Comparison

What is the python code to reproduce Deepseek R1?

How to train Deepseek R1?

The key points of Deepseek R1 Research Paper

Deepseek R1: Main Takeaways and Insights

Reproduce LLaVa 1.5 Note

 (Azure) End‑to‑end document‑processing pipeline that automatically ingests, analyzes, and indexes medical documents.

Streamlined Medical Documentation with Azure

Proximal Policy Optimization (PPO) is one of the most powerful reinforcement learning algorithms, balancing stability and efficiency. This article breaks down how AI gradually improves in decision-making using trial, error, and strategic policy updates—just like learning to ride a bike! 

PPO Explained for Dummies (With Python)

The REINFORCE algorithm is the most basic policy gradient reinforcement learning algorithm. Imagine you’re learning to ride a bicycle without a teacher to guide you on what to do. You can only learn through "try → see the result → adjust → try again." The REINFORCE algorithm is the mathematical expression of this learning process.

REINFORCE Explained for Dummies (With Python)

Imagine you're playing a game of chess, and there are many choices at each step. Monte Carlo Tree Search is like a smart assistant that helps you find the best move by "simulating the future.”

MCTS Explained for Dummies (With Python)

A2C (Advantage Actor-Critic) is essentially an upgrade of REINFORCE.

Advantage Actor-Critic (A2C) Explained for Dummies (With Python)

Getting Started with Roboflow: Annotate Your Dataset and Train Models All in One Place. A Hands-On Tutorial for Building a Coin-Detection App.
Computer Vision Workshop for ADSP 32023 IP01: Advanced Computer Vision with Deep Learning

Roboflow: Build A Coins Detection App

Several Ways to extract high quality of embeddings.

Deep Dive to LLM Training for Embedding Extraction

Fine-tuning LLM for Prediction/Classification

My personal archive of papers, notes, concepts, and insights.

My Paper Library & Concept Notes

Computer Vision Workshop Collection

Reinforcement Learning Workshop Collection

RAG Workshop Collection

 Recommendation System workshop Collection

Recommendation System Collection

Useful App or Code Library Collection

Blockchain Collection

LLM Fine-tune Collection

Reinforcement Learning

Home

Blog

History

Cateogry

Projects

AI Agents

Multimodal

author

publish date

read time

featured image

language

audience

source

comments enabled

meta description

password

icon

date

comment

type

slug

status

title

summary

Table

Post Gallery

Config

Post Board

类型为Notice的文章将被显示为公告

Notice

🚀 RL

RLHF = Reinforcement Learning = Alignment tuning?

How they relate—and why they’re not identical

What is the ‘Aha Moment’ phenomenon in R1-Zero’s training?

What are the four phases of the DeepSeek R1 training process?

What is Group Relative Policy Optimization (GRPO)?

How GRPO improves upon PPO for language model training?