Senior Director & Head of AMAP-ML, Alibaba Group
I lead AMAP-ML at Alibaba AMAP, a 100+ member product-facing AI team building foundation systems for spatial intelligence, generative intelligence, reasoning agents, and world models. My work connects academic research, open-source systems, and large-scale AMAP products serving 300M+ users every day.
14,000+ Total Citations
6,000+ As First Author
110+ Publications
300M+ Daily Users
Updated May 2026 · Citation metrics from Google Scholar.
Recent Updates
2026.05.18 MobilityBench accepted to KDD 2026, benchmarking route-planning agents in real-world mobility scenarios. 2026.05.11 DreamX-World released the 5B-Cam model and inference code for interactive world simulation. 2026.04.10 SkillClaw released an agentic evolver that turns real interaction traces into reusable skill libraries. View all updates
Featured Projects
Current Focus
LLM ReasoningA simple and strong reinforcement learning baseline for model reasoning — no critic, no reference model, no KL penalty. Adopted by ByteDance's
VERL framework as an official algorithm.
World ModelA general-purpose interactive world model that creates diverse, high-fidelity virtual environments with camera-controlled navigation and prompt-driven world events.
GenerationUnified self-supervised pretraining that bridges image generation and understanding within a single framework.
AgentAgentic skill evolution from real interaction traces, turning reusable skills into collective libraries across sessions, devices, and agents.
Earlier Impact
DetectionIndustrial-grade real-time object detection framework. Widely deployed in production across the industry, with a full training-to-deployment toolchain.
Vision-LanguageThe first vision language assistant that runs in real-time on mobile devices (Snapdragon 888). The 1B/3B models benchmark against Gemini Nano.
ArchitectureRevisiting spatial attention in Vision Transformers. Outperforms Swin Transformer with simpler design and better deployment properties.
FoundationA unified LLaMA-style backbone for vision tasks. Pioneered auto-scaling 2D RoPE for multimodal Transformers — the approach was later adopted by Qwen-VL and others. Surpasses ViT across generation, classification, segmentation, and detection.
Research Journey
2024 – Present · Alibaba AMAP
Spatial Intelligence, Generative Intelligence, Reasoning Agents & World Models
Leading a 100+ member product-facing AI team across two AMAP product anchors: spatial intelligence and generative intelligence. The technical stack spans LLM reasoning (GPG — adopted by ByteDance's VERL, Tree-GRPO, CoEvolve), world models (DreamX-World, Code2World), AI agents (SkillClaw), generative AI (DCW, S2-Guidance, FluxText), multimodal understanding, and intelligent mobility (MobilityBench, GenMRP). Research directly ships to production: multimodal technology powers the Saojie Bang (扫街榜) pipeline; large-scale industrial Agent system drives AMAP's AI Companion (AI 伴行) — contributing to AMAP products serving 300M+ users every day. Published 45+ papers at top venues; open-sourced 30+ AMAP-ML projects.
2020 – 2024 · Meituan
Vision Transformers, Multimodal Models & Industrial AI
Built the Visual Intelligence team from scratch. Created Twins (NeurIPS'21), CPVT (ICLR'23), VisionLLaMA (ECCV'24); reproduced LLaMA 7B and built MobileVLM for on-device deployment; open-sourced YOLOv6; shipped autonomous delivery and drone perception systems.
2017 – 2020 · Xiaomi
Neural Architecture Search & AutoML
Founded Xiaomi's AutoML team. Produced a series of influential NAS works — FairNAS (ICCV'21), FairDARTS (ECCV'20), DARTS- (ICLR'21), FALSR — establishing new standards for fair and robust architecture search. Featured by Lei Jun and major AI media.
2013 – 2017 · KingStar
Power Grid AI & Reinforcement Learning
Core contributor to the "Complex Power Grid Autonomous-Collaborative Automatic Voltage Control" project. Contributed 20 invention patents. Awarded the National Science and Technology Progress First Prize (2018).
2012 – 2013 · IBM Research China
Large-Scale Data Analytics
Research scientist working on large-scale data analytics and machine learning solutions.
Recognition
- Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
- National Science and Technology Progress First Prize, 2018 — contributed 20 invention patents
- 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
- Area Chair: ICLR, NeurIPS | Senior Program Committee: AAAI, IJCAI
- 40+ domestic and 7 international invention patents
Core Technical Directions
Spatial Intelligence — Route-planning agents (MobilityBench), map-augmented geolocalization (Thinking-with-Map), autonomous-driving VLA reasoning (AutoDrive-R2), urban scene understanding, and industrial mobility systems
Generative Intelligence — Scene-text editing (FluxText), diffusion-model optimization (DCW, S2-Guidance), video virtual try-on (Eevee), 3D editing (RL3DEdit), and controllable visual effects (Omni-Effects)
Reasoning Agents — Reinforcement learning for LLM reasoning (GPG, MathForge), tree-search agent training (Tree-GRPO), agent-data co-evolution (CoEvolve), and collective skill evolution (SkillClaw)
World Models & Interactive AI — Interactive world simulation (DreamX-World), GUI world models (Code2World), and benchmarks for dynamic 4D response capabilities (Omni-WorldBench)
Multimodal Understanding — Vision-language reasoning, visual policy optimization, spatial intelligence evaluation (SpatialGenEval), multimodal in-context learning (STV)
Foundation Architectures — Frequency-aware sparse attention (FASA), unified pretraining for generation and understanding (USP), end-to-end pixel generation without VAE (EPG), and diffusion LLMs (AR-MAP)
Team & Opportunities
I lead the AMAP-ML team at Alibaba Group, a 100+ member product-facing AI team with over half recruited from top AI labs globally, including multiple Google PhD Fellowship recipients.
Our philosophy: We build systems where research quality, engineering discipline, open-source reproducibility, and product deployment reinforce each other. Every core paper ships with reproducible code, and our work directly powers AMAP products serving 300M+ users every day.
Open Source
We maintain 30+ projects on GitHub spanning spatial intelligence, generative intelligence, reasoning agents, world models, and multimodal AI.
Hiring
We are always looking for talented interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI. Drop me an email if interested.
Education
- M.S. in Electrical Engineering, Tsinghua University, 2012
- B.S. in Electrical Engineering, Southeast University, 2010