CV
I lead a 100+ member product-facing AI team at Alibaba AMAP, building foundation systems for spatial intelligence and generative intelligence. My research traces an arc from neural architecture search through Vision Transformer design and multimodal foundation models to LLM reasoning, world models, agent systems, and large-scale AI products that ship to hundreds of millions of users. I have published 110+ papers at top venues, with 14,000+ citations and 10,000+ GitHub stars across open-source projects.
Awards & Recognition
- Top 100 AI Scholars, AMiner 2023 — selected from hundreds of thousands of AI researchers worldwide
- 3 first-authored papers on PaperDigest's Most Influential Paper List: FairNAS, Twins, CPVT
- 2nd Place, Xiaomi "Million Dollar Prize" — Automated Neural Network Design
Professional Experience
Leading a 100+ member product-facing AI team with over 50% recruited from top AI labs worldwide, across spatial intelligence, generative intelligence, reasoning agents, world models, foundation architectures, and multimodal understanding.
- Published 45+ papers at top venues (ICLR, CVPR, ACL, ICCV, NeurIPS, AAAI, EMNLP) and open-sourced 30+ AMAP-ML projects with 10,000+ cumulative GitHub stars
- Key first-author works: GPG (ICLR’26, adopted by ByteDance’s VERL framework), USP (ICCV’25); key team works: SkillClaw (1,300+ Stars), DreamX-World, Tree-GRPO, FASA, CoEvolve
- Multimodal technology powers AMAP’s Saojie Bang (扫街榜) pipeline; large-scale industrial Agent system drives AI Companion (AI 伴行) — both serving hundreds of millions of users
Built the Visual Intelligence team from scratch. Directed research in Vision Transformers, multimodal large models, and industrial AI systems.
- Created Twins (NeurIPS’21), CPVT (ICLR’23), VisionLLaMA (ECCV’24) — widely adopted Vision Transformer architectures; VisionLLaMA pioneered auto-scaling 2D RoPE, later adopted by Qwen-VL and others
- Built MobileVLM, the first real-time mobile VLM; reproduced LLaMA 7B from scratch
- Open-sourced YOLOv6 (5,700+ Stars), an industry-standard detection framework deployed across the industry
- Shipped 3D perception for autonomous delivery vehicles and drones; saved millions in annotation and serving costs annually
Founded Xiaomi’s AutoML team and produced a series of influential neural architecture search works.
- FairNAS (ICCV’21), FairDARTS (ECCV’20), DARTS- (ICLR’21), FALSR — establishing new standards for fair and robust architecture search
- Won 2nd place in Xiaomi’s first “Million Dollar Prize” (Automated Neural Network Design)
- FALSR super-resolution algorithm personally endorsed by CEO Lei Jun
- Core contributor to “Complex Power Grid Autonomy — Collaborative Automatic Voltage Control” project
- Contributed 20 invention patents; awarded National Science and Technology Progress First Prize (2018)
- Large-scale data analytics and machine learning solutions at IBM China Research Lab
Selected Publications
LLM Reasoning
- GPG: Simple & Strong RL for Reasoning — ICLR’26 · 1st Author
- Tree-GRPO: Tree Search for Agent RL — ICLR’26
- CoEvolve: Agent-Data Co-Evolution — ACL’26
- MathForge: Difficulty-Aware GRPO — ICLR’26
- AutoDrive-R2: Reasoning VLA for Driving — ICLR’26
Generative AI & World Models
- USP: Unified Pretraining for Gen & Understanding — ICCV’25 · 1st Author
- DCW: SNR-t Bias of Diffusion Models — CVPR’26
- S2-Guidance: Training-Free Diffusion Enhancement — ICLR’26
- EPG: End-to-End Pixel Generation without VAE — ICLR’26
- DreamX-World: Interactive World Model
AI Agents & Intelligent Mobility
- SkillClaw: Collective Skill Evolution — 1,300+ Stars
- Code2World: GUI World Model via Renderable Code
- MobilityBench: Route-Planning Agent Benchmark
Foundation Architectures
- VisionLLaMA: Unified LLaMA for Vision — ECCV’24 · 1st Author
- Twins: Spatial Attention in ViTs — NeurIPS’21 · 1st Author · Most Influential
- CPVT: Conditional Positional Encodings — ICLR’23 · 1st Author · Most Influential
- FASA: Frequency-Aware Sparse Attention — ICLR’26
- QARepVGG: Quantization-Aware RepVGG — AAAI’24 · 1st Author
Vision-Language & Detection
- MobileVLM: First Real-Time Mobile VLM · 1st Author
- YOLOv6: Industrial Object Detection — 5,700+ Stars
- SpatialGenEval: Spatial Intelligence Benchmark — ICLR’26
- PromptDet: Open-Vocabulary Detection — ECCV’22
AutoML & Neural Architecture Search
→ Full publication list (110+ papers)
Professional Service
Education
- M.S. in Electrical Engineering, Tsinghua University, 2012
- B.S. in Electrical Engineering, Southeast University, 2010
Patents
- 40+ domestic invention patents
- 7 international invention patents
