Lin Chen

Lin Chen

Differentiable Rendering SLAM Robotics Vision-Language-Action Models
🔬 ORCID
📍 Shaanxi, China

I am currently a fourth-year Ph.D. candidate in Aeronautical and Astronautical Science and Technology at Northwestern Polytechnical University under the supervision of Prof. Shuhui Bu, expected to graduate in 2026.

My research began with UAV rapid mapping, focusing on real-time generation of georeferenced orthoimages, elevation maps, and dense point clouds. This system has been commercialized and deployed across multiple universities and research institutes. Subsequently, I conducted research on single-agent 3D Gaussian Splatting (3DGS) based SLAM systems and multi-agent 2D Surfel SLAM systems. In multi-agent systems, I explored UWB-based mutual localization, independently developed a decentralized mesh-networked VTOL fixed-wing cluster flight system, investigated agent-based cluster fixed-wing operations, and explored the advantages of vision foundation models in 3DGS scene construction. Currently, my research interests include open-vocabulary environment representation and Vision-Language-Action (VLA) models.

Beyond academia, I serve as the Technical Lead at multiple startup companies, where I lead the development of commercial SLAM and 3D reconstruction hardware and software solutions for cluster systems. This dual expertise allows me to effectively bridge theoretical research with practical industry applications, delivering innovative solutions that advance both academic knowledge and commercial deployment.

Education

Northwestern Polytechnical University, Xi'an, China 985 211 双一流

2022 - 2026

Ph.D. • Aeronautical and Astronautical Science and Technology • School of Aeronautics

Northwestern Polytechnical University, Xi'an, China 985 211 双一流

2019 - 2022

M.S. • Vehicle Operation Engineering • School of Aeronautics

Postgraduate recommendation

Northwestern Polytechnical University, Xi'an, China 985 211 双一流

2015 - 2019

B.E. • Aircraft Design and Engineering • School of Aeronautics

西安视野慧图智能科技有限公司 - Technical Lead

March 2022 - Present

Leading the development of real-time mapping solutions based on SLAM and SfM technologies, generating orthoimages, elevation maps, and 3D point clouds in real-time. The system supports both Linux and Windows platforms with x64/aarch64 architectures, serving multiple universities and research institutes.

SIBITU Real-time Mapping System

A comprehensive real-time mapping solution that generates high-quality orthoimages (DOM) and digital elevation models (DEM) from UAV imagery. The system processes sequential aerial images in real-time, providing instant mapping results for various applications.

Cluster Real-time Reconstruction

Multi-UAV collaborative reconstruction system that enables real-time 3D mapping across large areas. The cluster system coordinates multiple drones to capture overlapping imagery, processing data in real-time to generate comprehensive 3D models and point clouds.

Self-organizing UAV Cluster Formation

An autonomous UAV swarm system that enables self-organizing formation flight patterns. The system allows multiple drones to coordinate their movements, maintain formation, and adapt to environmental changes without centralized control.

UAV Cluster

Agent-driven UAV Formation

An intelligent multi-agent system that enables autonomous mission planning and execution for UAV clusters. The system uses AI agents to coordinate complex flight missions, optimize resource allocation, and ensure mission success through adaptive decision-making.

Agent System

上海司岚博科技有限公司 - Technical Lead

August 2024 - Present

Collaborating with the Computer Vision Life platform, responsible for combining 3DGS and LIVO, exploring various approaches including depth constraints, 2D Gaussian representations, and pose re-optimization to enhance reconstruction quality. Later led the development of the SLAMiBot laser-vision product, creating a comprehensive solution that includes a LiDAR+Camera+RTK integrated data acquisition device with internal hardware-synchronized triggering, accompanied by supporting mobile applications, scanning gimbals, and LIVO-3DGS post-processing software.

SLAMiBot Hardware System

Multi-sensor time synchronization system with hardware triggering, featuring customizable exposure control and white balance adjustment. The integrated LiDAR+Camera+RTK device ensures precise temporal alignment across all sensors for accurate data fusion.

SLAMiBot Device

Real-time Mobile Application

Real-time display of colorized point clouds constructed by LiDAR, with comprehensive system status monitoring capabilities. The mobile app provides intuitive control and visualization for field operations.

Mobile App

Adaptive Scanning Gimbal

Efficient sweep mapping mode that automatically adjusts angles based on flight altitude and speed, ensuring consistent strip overlap rates unaffected by motion. The gimbal system optimizes scanning coverage and data quality.

Quantitative Investment - Independent Trader

February 2023 - Present

Developed and deployed a fully automated quantitative trading system for live trading, encompassing automatic data acquisition, strategy development, backtesting, live trading execution, and performance analysis throughout the complete workflow.

2025

CoMA-SLAM

CoMA-SLAM: Collaborative Multi-Agent Gaussian SLAM With Geometric Consistency

Lin Chen, Yongxin Su, Jvboxi Wang, Kun Li, Shuhui Bu, Guangming Wang, Pengcheng Han, Xia Zhenyu, Boni Hu, Shengqi Meng

Although Gaussian scene representation has achieved remarkable success in tracking and mapping, most existing methods are confined to single-agent systems. Current multi-agent solutions typically rely on centralized architectures, which struggle to account for communication bandwidth constraints. Furthermore, the inherent depth ambiguity of 3D Gaussian splatting poses notable challenges in maintaining geometric consistency. To address these challenges, we introduce CoMA-SLAM, the first distributed multi-agent Gaussian SLAM framework. By leveraging 2D Gaussian surfels and robust initialization strategy, CoMA-SLAM enhances tracking accuracy and geometry consistency. It efficiently manages communication bandwidth while dynamically scaling with the number of agents. Through the integration of intra- and inter-loop closure, distributed keyframe optimization and submap centric update, our framework ensures global consistency and robustly alignment. Synthetic and real-world experiments demonstrate that CoMA-SLAM outperforms state-of-the-art methods in pose accuracy, rendering fidelity, and geometric consistency while maintaining competitive efficiency across distributed multi-agent systems. Notably, by avoiding data transmission to a centralized server, our method reduces communication bandwidth by 99.8% compared to centralized approaches.

AAAI Conference on Artificial Intelligence CCF A 2025
GauS-SLAM

GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels

Yongxin Su, Lin Chen, Kaiting Zhang, Zhongliang Zhao, Chenfeng Hou, Ziping Yu

We propose GauS-SLAM, a dense RGB-D SLAM system that leverages 2D Gaussian surfels to achieve robust tracking and high-fidelity mapping. Our investigations reveal that Gaussian-based scene representations exhibit geometry distortion under novel viewpoints, which significantly degrades the accuracy of Gaussian-based tracking methods. These geometry inconsistencies arise primarily from the depth modeling of Gaussian primitives and the mutual interference between surfaces during the depth blending. To address these, we propose a 2D Gaussian-based incremental reconstruction strategy coupled with a Surface-aware Depth Rendering mechanism, which significantly enhances geometry accuracy and multi-view consistency. Additionally, the proposed local map design dynamically isolates visible surfaces during tracking, mitigating misalignment caused by occluded regions in global maps while maintaining computational efficiency with increasing Gaussian density. Extensive experiments across multiple datasets demonstrate that GauS-SLAM outperforms comparable methods, delivering superior tracking precision and rendering fidelity.

arXiv 2025
Cluster-ALIV

Cluster-ALIV: Aerial LiDAR-Inertia-Visual Dense Reconstruction for Cluster UAV

Xiaohan Li, Jie Zhang, Shuhui Bu, Lin Chen, Kun Li, Zhenyu Xia

Uncrewed Aerial Vehicles (UAVs) equipped with LiDAR, camera, and Inertial Measurement Unit sensors are increasingly utilized for real-time dense reconstruction in large-scale rescue operations and environmental monitoring, among others. However, achieving algorithmic robustness remains challenging due to the UAVs' high-speed flight and rapid pose changes. Additionally, energy constraints on individual UAVs can be mitigated through multi-UAV collaboration, improving operational efficiency. Nevertheless, when faced with unknown environments or the loss of Global Navigation Satellite System signal, most multi-UAV dense reconstruction systems can't work, making it hard to construct a global consistent map. In this letter, we propose Cluster-ALIV, a real-time dense reconstruction system for multiple UAVs that effectively supports aerial, large-scale scenarios with lost global positioning and weak co-visibility of LiDAR or vision. The system integrates LiDAR-Inertial-Visual odometry through multi-sensor fusion to generate accurate, gravity-aligned, colorized LiDAR point clouds and visual information with scale. Overall, in the Cluster-ALIV, each UAV executes a LiDAR-Inertial-Visual odometry, transmitting point cloud and visual data to a ground server, where multi-UAV joint optimization is performed through LiDAR post-processing, visual post-processing, and normal distributions transform refinement. Extensive experiments demonstrate that our system can efficiently construct large-scale dense map in real time with high accuracy and robustness.

IEEE Robotics and Automation Letters SCI Q2 2025
CODE

CODE: COllaborative Visual-UWB SLAM for Online Large-Scale Metric DEnse Mapping

Lin Chen, Xuan Jia, Shuhui Bu*, Guangming Wang*, Kun Li, Zhenyu Xia, Xiaohan Li, Pengcheng Han, Xuefeng Cao

This paper presents a novel collaborative online dense mapping system for multiple Unmanned Aerial Vehicles(UAVs). The system confers two primary benefits: it facilitates simultaneous UAVs co-localization and real-time dense map reconstruction, and it recovers the metric scale even in GNSS-denied conditions. To achieve these advantages, Ultrawideband (UWB) measurements, monocular Visual Odometry (VO), and co-visibility observations are jointly employed to recover both relative positions and global UAV poses, thereby ensuring optimality at both local and global scales. In the proposed methodology, a two-stage optimization strategy is proposed to reduce optimization burden. Initially, relative Sim3 transformations among UAVs are swiftly estimated, with UWB measurements facilitating metric scale recovery in the absence of GNSS. Subsequently, a global pose optimization is performed to effectively mitigate cumulative drift. By integrating UWB, VO, and co-visibility data within this framework, both local geometric consistency and global pose accuracy are robustly maintained. Through comprehensive simulation and empirical real-world testing, we demonstrate that our system not only improves UAV positioning accuracy in challenging scenarios but also facilitates the high-quality, online integration of dense point clouds in large-scale areas. This research offers valuable contributions and practical techniques for precise, real-time map reconstruction using an autonomous UAV fleet, particularly in GNSS-denied environments.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) CCF C 2025
G2-Mapping

G2-Mapping: General Gaussian Mapping for Monocular, RGB-D, and LiDAR-Inertial-Visual Systems

Lin Chen, Boni Hu, Jvboxi Wang, Shuhui Bu, Guangming Wang, Pengcheng Han

In this paper, we introduce G2-Mapping, a novel method to comprehensively support online monocular, RGB-D, and LiDAR-Inertial-Visual systems, employing 3D gaussian points as scene representation. There are several issues when applying 3d gaussian splatting (3DGS) techniques to simultaneous localization and mapping (SLAM) 1) for monocular, the lack of depth information makes scene initialization difficult and large baseline positioning challenging; 2) differentiable rendering with respect to depth and pose has not been implemented in 3DGS, making it difficult to directly apply to the SLAM system; 3) strategy for updating the scene with incoming online frames is not present, which may lead to memory overflow. In order to overcome problems mentioned above, we formulate a mathematical derivation and propose a differentiable rendering approach that leverages both depth and color to optimize the scene and pose. We introduce a simplified odometry that provides a metric depth estimation for monocular and enhance the low-overlap scene availability. A scale consistency and uncertainty weighted optimization is further proposed to eliminates the impact of inaccurate depth prediction. Our proposed scene updating strategy effectively prevents rapid memory growth. Tracking and mapping are performed alternatively to achieve precise localization and synchronous high-fidelity map reconstruction. Extensive experiments demonstrate that our G2-Mapping surpasses feature-based SLAM in localization precision and exceeds state-of-the-art neural SLAM methods in the fidelity of view synthesis.

IEEE Transactions on Automation Science and Engineering CCF B 2025
OriLoc

OriLoc: Unlimited-FoV and Orientation-Free Cross-View Geolocalization

Boni Hu, Haowei Li, Shuhui Bu, Lin Chen, Pengcheng Han

Cross-view image-based geolocalization enables accurate, drift-free navigation without external positioning signals, crucial for UAV delivery and disaster relief. However, existing research primarily focuses on ground panoramic images with known orientations, while real-world scenarios involve unknown orientations and limited field of view (FoV), creating a research-application gap. We introduce OriLoc, an innovative cross-view geolocalization method integrating sophisticated orientation estimation for limited FoV and arbitrary orientation scenarios. Our approach employs a dual-weighted soft-margin triplet loss with hard sample mining to extract discriminative features. Additionally, we develop an orientation estimation module using convolution-based sliding windows to assess similarity between satellite-view and query embeddings. The method demonstrates superior performance on three challenging datasets spanning commercial, residential, urban, and suburban areas across two continents. Results show that hard sample mining combined with appropriate learning objectives significantly enhances geolocalization for limited FoV and orientation-free images. Our orientation estimation module achieves remarkable accuracy when integrated with attention embeddings prior to polar transformation.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing SCI Q2 2025

2024

AutoFusion

AutoFusion: Autonomous Visual Geolocation and Online Dense Reconstruction for UAV Cluster

Yizhu Zhang, Shuhui Bu, Yifei Dong, Yu Zhang, Kun Li, Lin Chen

Real-time dense reconstruction using Unmanned Aerial Vehicle (UAV) is becoming increasingly popular in large-scale rescue and environmental monitoring tasks. However, due to the energy constraints of a single UAV, the efficiency can be greatly improved through the collaboration of multi-UAVs. Nevertheless, when faced with unknown environments or the loss of Global Navigation Satellite System (GNSS) signal, most multi-UAV SLAM systems can't work, making it hard to construct a global consistent map. In this paper, we propose a real-time dense reconstruction system called AutoFusion for multiple UAVs, which robustly supports scenarios with lost global positioning and weak co-visibility. A method for Visual Geolocation and Matching Network (VGMN) is suggested by constructing a graph convolutional neural network as a feature extractor. It can acquire geographical location information solely through images. We also present a real-time dense reconstruction framework for multi-UAV with autonomous visual geolocation. UAV agents send images and relative positions to the ground server, which processes the data using VGMN for multi-agent geolocation optimization, including initialization, pose graph optimization, and map fusion. Extensive experiments demonstrate that our system can efficiently and stably construct large-scale dense maps in real-time with high accuracy and robustness.

IEEE International Conference on Robotics and Automation (ICRA) CCF B 2024
CurriculumLoc

CurriculumLoc: Enhancing Cross-Domain Geolocalization Through Multistage Refinement

Boni Hu; Lin Chen; Runjian Chen; Shuhui Bu; Pengcheng Han; Haowei Li

Cross-domain geolocalization, which aims to estimate the geographical location of a query image based on a reference image from a different domain, is a challenging task due to the significant domain gap between the reference and query images. Existing methods typically focus on aligning the visual features of the two images, but overlook the domain-specific characteristics that can be exploited to enhance the geolocalization performance. In this paper, we propose CurriculumLoc, a multistage refinement framework for cross-domain geolocalization. Our method first aligns the visual features of the two images using a feature alignment network. Then, we introduce a curriculum learning strategy to progressively refine the geolocalization results. Specifically, we first train a coarse geolocalization model using a small dataset with limited domain gap. Subsequently, we fine-tune the model using a larger dataset with a larger domain gap. Finally, we perform a post-processing step to further improve the geolocalization accuracy. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the challenging cross-domain geolocalization task.

IEEE Transactions on Geoscience and Remote Sensing SCI Q1 2024

2023

HybridFusion

HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion

Yu Wang, Shuhui Bu, Lin Chen, Yifei Dong, Kun Li, Xuefeng Cao, Ke Li, Jie Jin

Recently, cross-source point cloud registration from different sensors has become a significant research focus. Although current methods have advanced homogenous point cloud registration, challenges persist in the cross-source domain due to varying point cloud densities from different sensors and missing points caused by different viewing angles, which have hindered its development. To address these issues, we propose HybridFusion, a novel algorithm designed specifically for cross-source point cloud registration in outdoor large-scale scenes, accommodating various sensors and viewing angles. Due to the unique characteristics of point clouds, it is not a singular module, but rather a coarse-to-fine process. To extract similarity information from cross-source point clouds, local patches of the point cloud are subjected to similarity matching. Subsequently, precise alignment is performed using their distinctive features, including 2D boundary points. Finally, the poses obtained from multiple patches are fused to achieve the final registration. Our proposed approach is extensively evaluated through both qualitative and quantitative experiments with existing methods. Additionally, a novel metric for point cloud completion is introduced. The results establish our method as the state-of-the-art solution for cross-source point cloud registration, with a remarkable 70% increase in accuracy compared to recent approaches.

IEEE Robotics and Automation Letters SCI Q2 2023
Fast Tree Detection

Fast Tree Detection and Counting on UAVs for Sequential Aerial Images with Generating Orthophoto Mosaicing

Pengcheng Han, Cunbao Ma, Jian Chen, Lin Chen, Shuhui Bu, Shibiao Xu, Yong Zhao, Chenhua Zhang, Tatsuya Hagino

Individual tree counting (ITC) is a popular topic in the remote sensing application field. The number and planting density of trees are significant for estimating the yield and for futher planing, etc. Although existing studies have already achieved great performance on tree detection with satellite imagery, the quality is often negatively affected by clouds and heavy fog, which limits the application of high-frequency inventory. Nowadays, with ultra high spatial resolution and convenient usage, Unmanned Aerial Vehicles (UAVs) have become promising tools for obtaining statistics from plantations. However, for large scale areas, a UAV cannot capture the whole region of interest in one photo session. In this paper, a real-time orthophoto mosaicing-based tree counting framework is proposed to detect trees using sequential aerial images, which is very effective for fast detection of large areas. Firstly, to guarantee the speed and accuracy, a multi-planar assumption constrained graph optimization algorithm is proposed to estimate the camera pose and generate orthophoto mosaicing simultaneously. Secondly, to avoid time-consuming box or mask annotations, a point supervised method is designed for tree counting task, which greatly speeds up the entire workflow. We demonstrate the effectiveness of our method by performing extensive experiments on oil-palm and acacia trees. To avoid the delay between data acquisition and processing, the proposed framework algorithm is embedded into the UAV for completing tree counting tasks, which also reduces the quantity of data transmission from the UAV system to the ground station.

Remote Sensing SCI Q2 2023

2021

RTSfM

RTSfM: Real-Time Structure From Motion for Mosaicing and DSM Mapping of Sequential Aerial Images With Low Overlap

Yong Zhao, Lin Chen, Xishan Zhang, Shibiao Xu, Shuhui Bu, Hongkai Jiang, Pengcheng Han, Ke Li, Gang Wan

Inspired by simultaneous localization and mapping (SLAM) style workflow, this article presented an online sequential structure from motion (SfM) solution for high-frequency video and large baseline high-resolution aerial images with high efficiency and novel precision. First, as traditional SLAM systems are not good in processing low overlap images, based on our novel hierarchical feature matching paradigm with multihomography and BoW, we proposed a robust tracking method where the relative pose and its scale are estimated separately followed by a joint optimization by considering both perspective-n-point (PnP) and epipolar constraints. Second, to further optimize the camera poses for the sparse map and dense pointcloud reconstruction, we provided a graph-based optimization with reprojection and GPS constraints, which make the camera trajectory and map georeferenced. We also incrementally generated the dense point cloud in real time from keyframes after local mapping optimization. Finally, we use a publicly available aerial image dataset with sequences of different environments, to evaluate the effectiveness of the proposed method, meanwhile, the robust performance of our solution is demonstrated with applications of high-quality aerial images mosaic and digital surface model (DSM) reconstruction in real time.

IEEE Transactions on Geoscience and Remote Sensing SCI Q1 2021
Svar

Svar: A Tiny C++ Header Brings Unified Interface for Multiple programming Languages

Yong Zhao, Pengcheng Zhao, Shibiao Xu, Lin Chen, Pengcheng Han, Shuhui Bu, Hongkai Jiang

There are numerous types of programming languages developed in the last decades, and most of them provide interface to call C++ or C for high efficiency implementation. The motivation of Svar is to design an efficient, light-weighted and general middle-ware for multiple languages, meanwhile, brings the dynamism features from script language to C++ in a straightforward way. Firstly, a Svar class with JSON like data structure is designed to hold everything exists in C++, including basic values, functions or user defined classes and objects. Secondly, arguments are auto cast to and from Svar efficiently with compile time pointers, references and shared_ptr detection. Thirdly, classes and functions are binded with string names to support reflection, this means all functions and classes in a shared library can be exported to a Svar object, which also calls a Svar module. The Svar modules can be accessed by different languages and this paper demonstrates how to import and use a Svar module in Python and Node.js. Moreover, the Svar modules or even a python module can also be imported by C++ at runtime, which makes C++ easier to compile and use since headers are not required anymore. We compare the performance of Svar with two state-of-the-art binding tool for Python and Node.js, and the result demonstrates that Svar is efficient, elegant and general.

arXiv 2021

2020

DenseFusion

DenseFusion: Large-Scale Online Dense Pointcloud and DSM Mapping for UAVs

Lin Chen, Yong Zhao, Shibiao Xu, Shuhui Bu, Pengcheng Han, Gang Wan

Large-scale online dense point cloud and digital surface model (DSM) mapping is a challenging task for unmanned aerial vehicles (UAVs). Existing methods either focus on single-view reconstruction or require a large number of images, which are not suitable for UAVs with limited onboard storage and computational resources. In this paper, we propose a large-scale online dense point cloud and DSM mapping method for UAVs. Our method is based on the DenseFusion framework, which is a state-of-the-art method for large-scale online dense point cloud and DSM mapping. Our method is evaluated on the public dataset, and the results show that our method is able to achieve a high-quality mapping result with a small number of images.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) CCF C 2020