About me
senior director, AMAP, Alibaba Group
I am a senior director in AMAP and leading a team that delivers intelligent route navigation and implements AIGC solutions for core business scenarios. Before that, I was a senior manager in Meituan. I obtained a bachelor of science in Electrical Engineering from SouthEast University in 2010 and later a master of science in Electrical Engineering at Tsinghua University in 2012. I was selected as one of the top 100 AI scholars by AMiner in 2023.
Research and Innovation: Published 40+ papers at AI conferences, of which 20 were first-authored; 3 first-authored papers were selected for the PaperDigest’s most influential paper list (FairNAS, Twins, CPVT); holds over 40 domestic and 7 international invention patents.
Research Directions: LLM pre-training, multimodal large models, reinforcement learning and generative models, foundation model design, NAS, model compression, self-supervised learning, 2D perception, 3D detection, etc., and has paper output or business implementation in the above directions.
Achievements: Has rich implementation experience in [AI middle platform, autonomous driving, edge cloud]; excels at creating value for the company and customers through technology, especially good at overcoming difficulties; good at coordinating resources, cooperation and win-win.
Team Building and Leadership: Has experience managing a technical team of more than 50 people, excels at building influential technical teams, over 50% of team members come from famous domestic and international AI labs, good at attracting top talents in the industry, open source well-known detection framework in the industry.
Influence: Multiple industry invited talks, top conference shares, technology media reports, Lei Jun & Xiaomi’s multiple public praises, etc.
We are always looking for talented interns/full-time researchers with strong coding skills and research experience. Please drop me an email if interested.
Work experience
- Mar 2024 - present, Alibaba Group, AMAP, Senior Director
- Team direction: Multi-modal large models, Generation Models, Route Navigation, Recommendation, Spatio-temporal Data Mining
- May 2020 - Mar 2024, Meituan, Visual Intelligence Department, Senior Technical Manager
- Team direction: Multi-modal large models,Large Language Model, perception (2D+3D), foundation model, self-supervised pre-training, model compression, MLOps.
- Large foundation models (LLM and multi-modal):
- Responsible for building a general understanding (multi-modal) large model, open set understanding, and landing in Meituan scenarios.
- Compression and efficient deployment of the company’s trillion LLM, reaches the advanced level in the industry.
- Training and deployment of the LLM base model for the general intelligent robot (embodied intelligence) scene.
- Reproduced Meta LLaMA 7B base model and SFT from scratch, and independently developed 1B and 3B MobileVLM benchmarking Gemini Nano. MobileVLM v2 outperforms many popular larger models.
- Foundation models: VisionLLaMA unifies the architecture of vision and language, which also outperforms ViT in both understanding and generation tasks. Twins outperforms Swin and enjoys more deployment advantages in production environment.
- Core technology breakthroughs in autonomous driving and drones:
- Building the 3D offline system for autonomous delivery vehicles, saving the business many million in annotation costs annually.
- Improving online perception algorithms (3D obstacle detection, tracking), helping the autonomous vehicle team achieve industry-leading real-time perception capabilities. Improving the efficiency of building high-precision maps.
- Several core perception modules have been launched on the third and fourth generation drones.
- Broad AI business support and MLOps construction:
- Construction vision code base, construction of AutoML tools, model compression and deployment (end, edge, cloud), covering most of vision applications such as face recognition, OCR, security review, image understanding, saving many million in vision service costs annually.
- Team building:
- Building a team from 0 to 1, completing interviews with multiple Google PHD Fellowship, attracting top talents recognized in the industry.
- Influence building:
- Open sourcing and maintaining the well-known detection library YOLOv6.
- As a project backbone, undertaking the national science and technology innovation 2030 major special project 3.3 research on key technologies of artificial intelligence fundamental models.
- March 2017 - May 2020, Xiaomi, Artificial Intelligence Department, Senior Technical Manager
- Team direction: AutoML, base model design, machine learning.
- Initiated the AutoML project, built the AutoML team from 0 to 1, responsible for all-round technical research, algorithm development and implementation. Implemented in multiple businesses such as scene recognition, segmentation, acoustic scene classification, recommendation, etc.
- Lead the team and won the second place (“Automated Neural Network Design”) in the first “Million Dollar Prize” of Xiaomi Company.
- June 2013 - March 2017, Beijing KingStar System Control Co., Ltd., Deputy Director
- As a technical backbone, participated in the “Complex Power Grid Autonomy - Collaborative Automatic Voltage Control Key Technology, System Development and Engineering Application” project, which was awarded the 2018 National Science and Technology Progress First Prize (contributing 20 invention patents).
- July 2012 - May 2013, IBM Research of China (CRL), Research Scientist
Publications
First-author papers
- Twins: Revisiting the design of spatial attention in vision transformers, NeurIPS21
- Conditional Positional Encodings for Vision Transformers, ICLR23
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks, ECCV24
- MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
- MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
- FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search, ICCV21
- Fair DARTS: Eliminating unfair advantages in differentiable architecture search, ECCV20
- DARTS-: robustly stepping out of performance collapse without indicators, ICLR21
- ROME: Robustifying memory-efficient nas via topology disentanglement and gradients accumulation, ICCV23
- Make RepVGG Greater Again: A Quantization-aware Approach, AAAI24
- MixPATH: A unified approach for one-shot neural architecture search,ICCV23
- Noisy differentiable architecture search, BMVC21
- A Unified Mixture-View Framework for Unsupervised Representation Learning, BMVC22
- Multi-objective reinforced evolution in mobile neural architecture search, ECCVW2020
- Fast, accurate and lightweight super-resolution with neural architecture search, ICPR20
- MoGA: Searching beyond mobilenetv3, ICASSP2020
- Scarlet-NAS: bridging the gap between stability and scalability in weight-sharing neural architecture search, ICCVW21
- Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning
- Improved crowding distance for NSGA-II
- Policy optimization with penalized point probability distance: An alternative to proximal policy optimization
Other Collaborative Papers
- Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition, ECCV24
- Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness, ICML24
- PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution, CVPR24
- LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection, ICLR24
- Norm Tweaking: High-performance Low-bit Quantization of Large Language Models, AAAI24
- YOLOv6: A single-stage object detection framework for industrial applications (github stars 5.3k)
- A Speed Odyssey for Deployable Quantization of LLMs
- FPTQ: Fine-grained Post-Training Quantization for Large Language Models
- Lenna: Language Enhanced Reasoning Detection Assistant
- SCTNet: Single Branch CNN with Transformer Semantic Information for Real-time Segmentation, AAAI24
- Promptdet: Towards open-vocabulary detection using uncurated images, ECCV22
- SegVIT: Semantic segmentation with plain vision transformers, NeurIPS22
- Fully convolutional one-stage 3D object detection on LiDAR range images, NeurIPS22
- Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation, CVPR22
- Aedet: Azimuth-invariant multi-view 3d object detection, CVPR23
- EAPruning: Evolutionary Pruning for Vision Transformers and CNNs, BMVC22
- AutoKWS: Keyword Spotting with Differentiable Architecture Search, ICASSP21
- Neural Architecture Search on Acoustic Scene Classification, InterSpeech20
- Accurate and efficient single image super-resolution with matrix channel attention network, ACCV20
- STRETCH meat grinder with ICCOS, IEEE Transactions on Plasma Science
- Comparisons of three inductive pulse power supplies, IEEE Transactions on Plasma Science
- FastPillars: A Deployment-friendly Pillar-based 3D Detector
- LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge
Selected Related Reports
- 全面超越ViT,美团、浙大等提出视觉任务统一架构VisionLLAMA
- 端侧实时运行、3B媲美7B!美团、浙大等提出MobileVLM V2:更快、更强的端侧视觉语言模型
- 骁龙888实时运行,美团、浙大等打造全流程移动端多模态大模型MobileVLM
- 美团提出基于隐式条件位置编码的Transformer,性能优于 ViT 和 DeiT
- Twins:重新思考高效的视觉注意力模型设计
- 更准更快的YOLOv6来了,美团出品并开源
- 小米AI实验室成果速递
- 雷军强推:小米造最强超分辨率算法,现已开源
- 超越MnasNet、Proxyless:小米提出全新神经架构搜索算法FairNAS
- 两个月三项成果,对标谷歌!独家对话小米AutoML团队,如何让模型搜索更公平