CV
I lead a 100+ member product-facing AI team at Alibaba AMAP, building foundation systems for spatial intelligence and generative intelligence. My research traces an arc from neural architecture search through Vision Transformer design and multimodal foundation models to LLM reasoning, world models, agent systems, and large-scale AMAP products serving 300M+ users every day. I have authored 120+ research papers and preprints, including publications at top venues, with 15,000+ citations (6,000+ from first-authored works) across open-source projects.
Awards & Recognition
- Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
- 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
- 2nd Place, Xiaomi "Million Dollar Prize" — Automated Neural Network Design
Professional Experience
Leading a 100+ member product-facing AI team across spatial intelligence, generative intelligence, reasoning agents, world models, foundation architectures, and multimodal understanding.
- Published 50+ papers at top venues (ICLR, CVPR, ICML, ACL, KDD, ICCV, ECCV, NeurIPS, AAAI, EMNLP, SIGGRAPH) and open-sourced 30+ AMAP-ML projects
- Key first-author works: GPG (ICLR 2026, adopted by ByteDance’s VERL framework), USP (ICCV 2025); key team works: SkillClaw, DreamX-World, Tree-GRPO, FASA, CoEvolve
- Multimodal technology supports AMAP’s Saojie Bang (扫街榜) pipeline; large-scale industrial Agent work contributes to AI Companion (AI 伴行) — alongside AMAP products serving 300M+ users every day
Built the Visual Intelligence team from scratch. Directed research in Vision Transformers, multimodal large models, and industrial AI systems.
- Created Twins (NeurIPS 2021), CPVT (ICLR 2023), VisionLLaMA (ECCV 2024) — influential Vision Transformer architectures; VisionLLaMA introduced auto-scaling 2D RoPE for LLaMA-style vision backbones
- Built MobileVLM, a compact VLM designed for real-time on-device deployment; reproduced LLaMA 7B from scratch
- Open-sourced YOLOv6, a widely used industrial object detection framework; developed QARepVGG to address quantization challenges in RepVGG-style deployment
- Shipped 3D perception for autonomous delivery vehicles and drones, reducing annotation and serving costs
Founded Xiaomi’s AutoML team and produced a series of influential neural architecture search works.
- FairNAS (ICCV 2021), FairDARTS (ECCV 2020), DARTS- (ICLR 2021), FALSR — establishing new standards for fair and robust architecture search
- Won 2nd place in Xiaomi’s first “Million Dollar Prize” (Automated Neural Network Design)
- FALSR super-resolution algorithm personally endorsed by CEO Lei Jun
- Core contributor to “Complex Power Grid Autonomy — Collaborative Automatic Voltage Control” project
- Contributed 20 invention patents; awarded National Science and Technology Progress First Prize (2018)
- Large-scale data analytics and machine learning solutions at IBM China Research Lab
Selected Publications
LLM Reasoning
- GPG: Simple & Strong RL for Reasoning — ICLR 2026 · 1st Author
- Tree-GRPO: Tree Search for Agent RL — ICLR 2026
- CoEvolve: Agent-Data Co-Evolution — ACL 2026
- MathForge: Difficulty-Aware GRPO — ICLR 2026
- AutoDrive-R2: Reasoning VLA for Driving — ICLR 2026
Generative AI & World Models
- USP: Unified Pretraining for Gen & Understanding — ICCV 2025 · 1st Author
- DCW: SNR-t Bias of Diffusion Models — CVPR 2026
- S2-Guidance: Training-Free Diffusion Enhancement — ICLR 2026
- EPG: End-to-End Pixel Generation without VAE — ICLR 2026
- DreamX-World: Interactive World Model
AI Agents & Intelligent Mobility
- SkillClaw: Collective Skill Evolution
- Code2World: GUI World Model via Renderable Code
- MobilityBench: Route-Planning Agent Benchmark — KDD 2026 Oral
Foundation Architectures
- VisionLLaMA: Unified LLaMA for Vision — ECCV 2024 · 1st Author
- Twins: Spatial Attention in ViTs — NeurIPS 2021 · 1st Author · Most Influential
- CPVT: Conditional Positional Encodings — ICLR 2023 · 1st Author · Most Influential
- FASA: Frequency-Aware Sparse Attention — ICLR 2026
- QARepVGG: Quantization-Aware RepVGG — AAAI 2024 · 1st Author
Vision-Language & Detection
- MobileVLM: Real-Time Mobile Vision-Language Model · 1st Author
- YOLOv6: Industrial Object Detection
- SpatialGenEval: Spatial Intelligence Benchmark — ICLR 2026
- PromptDet: Open-Vocabulary Detection — ECCV 2022
AutoML & Neural Architecture Search
→ Full publication list (120+ papers)
Professional Service
Education
- M.S. in Electrical Engineering, Tsinghua University, 2012
- B.S. in Electrical Engineering, Southeast University, 2010
Patents
- 40+ domestic invention patents
- 7 international invention patents
