About me

Short Bio

He obtained a bachelor of science in Electrical Engineering from SouthEast University in 2010 and later a master of science in Electrical Engineering at Tsinghua University in 2012. He was selected as one of the top 100 AI scholars by AMiner in 2023.

  1. Research and Innovation: Published 40+ papers at AI conferences, of which 20 were first-authored; 3 first-authored papers were selected for the PaperDigest’s most influential paper list (FairNAS, Twins, CPVT); holds over 40 domestic and 7 international invention patents.

  2. Research Directions: LLM pre-training, multimodal large models, reinforcement learning and generative models, foundation model design, NAS, model compression, self-supervised learning, 2D perception, 3D detection, etc., and has paper output or business implementation in the above directions.

  3. Achievements: Has rich implementation experience in [AI middle platform, autonomous driving, edge cloud]; excels at creating value for the company and customers through technology, especially good at overcoming difficulties; good at coordinating resources, cooperation and win-win.

  4. Team Building and Leadership: Has experience managing a technical team of more than 30 people, excels at building influential technical teams, over 50% of team members come from famous domestic and international AI labs, good at attracting top talents in the industry, open source well-known detection framework in the industry.

  5. Influence: Multiple industry invited talks, top conference shares, technology media reports, Lei Jun & Xiaomi’s multiple public praises, etc.

We are always looking for talented interns/full-time researchers with strong coding skills and research experience. Please drop me an email if interested.

Work experience

  • May 2020 - Present, Meituan, Visual Intelligence Department, Senior Technical Manager
    • Team direction: Multi-modal large models,Large Language Model, perception (2D+3D), foundation model, self-supervised pre-training, model compression, MLOps.
    • Large foundation models (LLM and multi-modal):
      • Responsible for building a general understanding (multi-modal) large model, open set understanding, and landing in Meituan scenarios.
      • Compression and efficient deployment of the company’s trillion LLM, reaches the advanced level in the industry.
      • Training and deployment of the LLM base model for the general intelligent robot (embodied intelligence) scene.
      • Reproduced Meta LLaMA 7B base model and SFT from scratch, and independently developed 1B and 3B MobileVLM benchmarking Gemini Nano. MobileVLM v2 outperforms many popular larger models.
      • Foundation models: VisionLLaMA unifies the architecture of vision and language, which also outperforms ViT in both understanding and generation tasks. Twins outperforms Swin and enjoys more deployment advantages in production environment.
    • Core technology breakthroughs in autonomous driving and drones:
      • Building the 3D offline system for autonomous delivery vehicles, saving the business many million in annotation costs annually.
      • Improving online perception algorithms (3D obstacle detection, tracking), helping the autonomous vehicle team achieve industry-leading real-time perception capabilities. Improving the efficiency of building high-precision maps.
      • Several core perception modules have been launched on the third and fourth generation drones.
    • Broad AI business support and MLOps construction:
      • Construction vision code base, construction of AutoML tools, model compression and deployment (end, edge, cloud), covering most of vision applications such as face recognition, OCR, security review, image understanding, saving many million in vision service costs annually.
    • Team building:
      • Building a team from 0 to 1, completing interviews with multiple Google PHD Fellowship, attracting top talents recognized in the industry.
    • Influence building:
      • Open sourcing and maintaining the well-known detection library YOLOv6.
      • As a project backbone, undertaking the national science and technology innovation 2030 major special project 3.3 research on key technologies of artificial intelligence fundamental models.
  • March 2017 - May 2020, Xiaomi, Artificial Intelligence Department, Senior Technical Manager
    • Team direction: AutoML, base model design, machine learning.
    • Initiated the AutoML project, built the AutoML team from 0 to 1, responsible for all-round technical research, algorithm development and implementation. Implemented in multiple businesses such as scene recognition, segmentation, acoustic scene classification, recommendation, etc.
    • Lead the team and won the second place (“Automated Neural Network Design”) in the first “Million Dollar Prize” of Xiaomi Company.
  • June 2013 - March 2017, Beijing KingStar System Control Co., Ltd., Deputy Director
    • As a technical backbone, participated in the “Complex Power Grid Autonomy - Collaborative Automatic Voltage Control Key Technology, System Development and Engineering Application” project, which was awarded the 2018 National Science and Technology Progress First Prize (contributing 20 invention patents).
  • July 2012 - May 2013, IBM Research of China (CRL), Research Scientist

Publications

First-author papers

  1. Twins: Revisiting the design of spatial attention in vision transformers, NeurIPS21
  2. Conditional Positional Encodings for Vision Transformers, ICLR23
  3. VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
  4. MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
  5. MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
  6. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search, ICCV21
  7. Fair DARTS: Eliminating unfair advantages in differentiable architecture search, ECCV20
  8. DARTS-: robustly stepping out of performance collapse without indicators, ICLR21
  9. ROME: Robustifying memory-efficient nas via topology disentanglement and gradients accumulation, ICCV23
  10. Make RepVGG Greater Again: A Quantization-aware Approach, AAAI24
  11. MixPATH: A unified approach for one-shot neural architecture search,ICCV23
  12. Noisy differentiable architecture search, BMVC21
  13. A Unified Mixture-View Framework for Unsupervised Representation Learning, BMVC22
  14. Multi-objective reinforced evolution in mobile neural architecture search, ECCVW2020
  15. Fast, accurate and lightweight super-resolution with neural architecture search, ICPR20
  16. MoGA: Searching beyond mobilenetv3, ICASSP2020
  17. Scarlet-NAS: bridging the gap between stability and scalability in weight-sharing neural architecture search, ICCVW21
  18. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning
  19. Improved crowding distance for NSGA-II
  20. Policy optimization with penalized point probability distance: An alternative to proximal policy optimization

Other Collaborative Papers

  1. PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution, CVPR2024
  2. LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection, ICLR24
  3. Norm Tweaking: High-performance Low-bit Quantization of Large Language Models, AAAI24
  4. YOLOv6: A single-stage object detection framework for industrial applications (github stars 5.3k)
  5. A Speed Odyssey for Deployable Quantization of LLMs
  6. FPTQ: Fine-grained Post-Training Quantization for Large Language Models
  7. Lenna: Language Enhanced Reasoning Detection Assistant
  8. SCTNet: Single Branch CNN with Transformer Semantic Information for Real-time Segmentation, AAAI24
  9. Promptdet: Towards open-vocabulary detection using uncurated images, ECCV22
  10. SegVIT: Semantic segmentation with plain vision transformers, NeurIPS22
  11. Fully convolutional one-stage 3D object detection on LiDAR range images, NeurIPS22
  12. Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation, CVPR22
  13. Aedet: Azimuth-invariant multi-view 3d object detection, CVPR23
  14. EAPruning: Evolutionary Pruning for Vision Transformers and CNNs, BMVC22
  15. AutoKWS: Keyword Spotting with Differentiable Architecture Search, ICASSP21
  16. Neural Architecture Search on Acoustic Scene Classification, InterSpeech20
  17. Accurate and efficient single image super-resolution with matrix channel attention network, ACCV20
  18. STRETCH meat grinder with ICCOS, IEEE Transactions on Plasma Science
  19. Comparisons of three inductive pulse power supplies, IEEE Transactions on Plasma Science
  20. FastPillars: A Deployment-friendly Pillar-based 3D Detector
  21. LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge

Selected Related Reports

  1. 全面超越ViT,美团、浙大等提出视觉任务统一架构VisionLLAMA
  2. 端侧实时运行、3B媲美7B!美团、浙大等提出MobileVLM V2:更快、更强的端侧视觉语言模型
  3. 骁龙888实时运行,美团、浙大等打造全流程移动端多模态大模型MobileVLM
  4. 美团提出基于隐式条件位置编码的Transformer,性能优于 ViT 和 DeiT
  5. Twins:重新思考高效的视觉注意力模型设计
  6. 更准更快的YOLOv6来了,美团出品并开源
  7. 小米AI实验室成果速递
  8. 雷军强推:小米造最强超分辨率算法,现已开源
  9. 超越MnasNet、Proxyless:小米提出全新神经架构搜索算法FairNAS
  10. 两个月三项成果,对标谷歌!独家对话小米AutoML团队,如何让模型搜索更公平