Senior Director & Head of AMAP-ML, Alibaba Group
I lead AMAP-ML at Alibaba AMAP, a 100+ member product-facing AI team building foundation systems for spatial intelligence, generative intelligence, reasoning agents, and world models. My work connects academic research, open-source systems, and large-scale AMAP products serving 300M+ users every day.
14,000+ Total Citations
6,000+ As First Author
120+ Publications
300M+ Daily Users
Updated May 2026 · Citation metrics from Google Scholar.
Recent Updates
2026.05.18 MobilityBench accepted to KDD 2026, benchmarking route-planning agents in real-world mobility scenarios. 2026.05.11 DreamX-World released the 5B-Cam model and inference code for interactive world simulation. 2026.04.10 SkillClaw released an agentic evolver that turns real interaction traces into reusable skill libraries. View all updates
Featured Projects
Current Focus
LLM ReasoningA simple and strong reinforcement learning baseline for model reasoning — no critic, no reference model, no KL penalty. Adopted by ByteDance's
VERL framework as an official algorithm.
World ModelA general-purpose interactive world model that creates diverse, high-fidelity virtual environments with camera-controlled navigation and prompt-driven world events.
GenerationUnified self-supervised pretraining that bridges image generation and understanding within a single framework.
AgentAgentic skill evolution from real interaction traces, turning reusable skills into collective libraries across sessions, devices, and agents.
Earlier Impact
DetectionIndustrial-grade real-time object detection framework with a full training-to-deployment toolchain and broad open-source adoption.
Vision-LanguageA compact vision-language assistant designed for real-time on-device deployment, with 1B/3B models evaluated on mobile hardware such as Snapdragon 888.
ArchitectureRevisiting spatial attention in Vision Transformers. Outperforms Swin Transformer with simpler design and better deployment properties.
FoundationA unified LLaMA-style backbone for vision tasks, introducing auto-scaling 2D RoPE for multimodal Transformers and reporting strong results across generation, classification, segmentation, and detection.
Research Journey
2024 – Present · Alibaba AMAP
Spatial Intelligence, Generative Intelligence, Reasoning Agents & World Models
Leading a 100+ member product-facing AI team across two AMAP product anchors: spatial intelligence and generative intelligence. The technical stack spans LLM reasoning (GPG — adopted by ByteDance's VERL, Tree-GRPO, CoEvolve), world models (DreamX-World, Code2World), AI agents (SkillClaw), generative AI (DCW, S2-Guidance, FluxText), multimodal understanding, and intelligent mobility (MobilityBench, GenMRP). Several research lines connect to production systems: multimodal technology supports the Saojie Bang (扫街榜) pipeline; large-scale industrial Agent work contributes to AMAP's AI Companion (AI 伴行) — alongside AMAP products serving 300M+ users every day. Published 45+ papers at top venues; open-sourced 30+ AMAP-ML projects.
2020 – 2024 · Meituan
Vision Transformers, Multimodal Models & Industrial AI
Built the Visual Intelligence team from scratch. Created Twins (NeurIPS 2021), CPVT (ICLR 2023), VisionLLaMA (ECCV 2024); reproduced LLaMA 7B and built MobileVLM for on-device deployment; open-sourced YOLOv6; shipped autonomous delivery and drone perception systems.
2017 – 2020 · Xiaomi
Neural Architecture Search & AutoML
Founded Xiaomi's AutoML team. Produced a series of influential NAS works — FairNAS (ICCV 2021), FairDARTS (ECCV 2020), DARTS- (ICLR 2021), FALSR — establishing new standards for fair and robust architecture search. Featured by Lei Jun and major AI media.
2013 – 2017 · KingStar
Power Grid AI & Reinforcement Learning
Core contributor to the "Complex Power Grid Autonomous-Collaborative Automatic Voltage Control" project. Contributed 20 invention patents. Awarded the National Science and Technology Progress First Prize (2018).
2012 – 2013 · IBM Research China
Large-Scale Data Analytics
Research scientist working on large-scale data analytics and machine learning solutions.
Recognition
- Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
- National Science and Technology Progress First Prize, 2018 — contributed 20 invention patents
- 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
- Area Chair: ICLR, NeurIPS | Senior Program Committee: AAAI, IJCAI
- 40+ domestic and 7 international invention patents
Core Technical Directions
Spatial Intelligence — Route-planning agents (MobilityBench), map-augmented geolocalization (Thinking-with-Map), autonomous-driving VLA reasoning (AutoDrive-R2), urban scene understanding, and industrial mobility systems
Generative Intelligence — Scene-text editing (FluxText), diffusion-model optimization (DCW, S2-Guidance), video virtual try-on (Eevee), 3D editing (RL3DEdit), and controllable visual effects (Omni-Effects)
Reasoning Agents — Reinforcement learning for LLM reasoning (GPG, MathForge), tree-search agent training (Tree-GRPO), agent-data co-evolution (CoEvolve), and collective skill evolution (SkillClaw)
World Models & Interactive AI — Interactive world simulation (DreamX-World), GUI world models (Code2World), and benchmarks for dynamic 4D response capabilities (Omni-WorldBench)
Multimodal Understanding — Vision-language reasoning, visual policy optimization, spatial intelligence evaluation (SpatialGenEval), multimodal in-context learning (STV)
Foundation Architectures — Frequency-aware sparse attention (FASA), unified pretraining for generation and understanding (USP), end-to-end pixel generation without VAE (EPG), and diffusion LLMs (AR-MAP)
Team & Opportunities
I lead the AMAP-ML team at Alibaba Group, a 100+ member product-facing AI team with strong research and engineering backgrounds across foundation models, agents, multimodal learning, spatial intelligence, and generative AI.
Our philosophy: We build systems where research quality, engineering discipline, open-source reproducibility, and product deployment reinforce each other. Many core projects ship with reproducible code, and our work contributes to AMAP products serving 300M+ users every day.
Open Source
We maintain 30+ projects on GitHub spanning spatial intelligence, generative intelligence, reasoning agents, world models, and multimodal AI.
Hiring
We are always looking for talented interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI. Drop me an email if interested.
Education
- M.S. in Electrical Engineering, Tsinghua University, 2012
- B.S. in Electrical Engineering, Southeast University, 2010