RSagent_logo

RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
Aerospace Information Research Institute, Chinese Academy of Sciences
Department of Computer Science, City University of Hong Kong
*Equal Contribution

With the recent advancements in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs), an increasing number of models have demonstrated impressive performance in various remote sensing tasks. However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in professional domains. To address these limitations, we propose RS-Agent, an intelligent agent for remote sensing applications. Firstly, RS-Agent is powered by an LLM that acts as its ``Central Controller'', enabling it to understand and respond to various problems intelligently. Secondly, our RS-Agent integrates existing high-performance remote sensing image processing tools, facilitating multi-tool and multi-turn conversations that enhance its capability to tackle more complex applications. Thirdly, our RS-Agent leverages a knowledge graph-enhanced Retrieval-Augmented Generation (RAG) framework to access and utilize professional remote sensing knowledge, enabling accurate responses to expert-level queries. We conducted experiments on multiple datasets, and the results show that RS-Agent achieves over 95\% task planning accuracy. It also demonstrates strong domain-specific knowledge retrieval capabilities and delivers outstanding performance across a wide range of tasks.

🏆 Contributions

  1. We present RS-Agent, a novel architecture designed to interpret user queries and orchestrate diverse tools for accurate and efficient remote sensing task execution. Its four core components—Central Controller, Toolkits, Solution Space, and Knowledge Space—work in concert, seamlessly interacting and complementing one another to enable robust, adaptive performance across a wide range of applications.

  2. To enhance the agent’s task planning accuracy, we propose an innovative Task-Aware Retrieval method. By retrieving and understanding expert-level task solutions, RS-Agent is able to emulate the decision-making and tool selection processes of professional remote sensing analysts.

  3. To strengthen RS-Agent’s domain-specific knowledge, we propose DualRAG, a retrieval augmented generation method that assigns weights to extracted keywords and performs dual path retrieval, thereby enhancing the accuracy and relevance of knowledge retrieval.

  4. Extensive experiments demonstrate that RS-Agent consistently surpasses previous SOTA Multimodal Large Language Models across a range of remote sensing applications, and significantly boosts the task planning accuracy. These results establish RS-Agent as a major step forward in adapting AI agents to the remote sensing field, and, for the first time, present a comprehensive and modular architecture tailored for remote sensing applications.

unirs The schematic diagram of the RS-Agent.

RS-Agent employs an LLM to understand the user's requirements.

RS-Agent can utilize multiple tools and engage in multi-turn conversations.

RS-Agent is capable of answering questions in specialized fields.

The RS-Agent integrates existing high-performance remote sensing tools. It can understand user intentions like a Central Controller, and solve user needs through planning, reasoning and action. It is also Capable of handling professional and technical knowledge in remote sensing.

unirs_logoRS-Agent: Architecture

When ${M}_{c}$ receives query $Q$ and image $I$, ${M}_{c}$ will transmit the solution requirement ${r}_{s}$ to ${M}_{s}$ . ${M}_{s}$ employs the FIASS algorithm to derive solution guidance ${g}_{s}$ , which assists ${M}_{c}$ in selecting the appropriate tools $\hat{T}$ after dispatching the tool requirement ${r}_{t}$ to the tool space $T$. If ${M}_{c}$ requires additional knowledge guidance ${g}_{k}$, ${M}_{k}$ will provide it from ${D}_{k}$ according to the knowledge requirement ${k}_{s}$. ${M}_{c}$ will then invoke $\hat{T}$ and produce the final answer $A$ along with the processed image $\hat{I}$ .


Demo Video

Here is a demonstration of the RS-Agent in action.

Watch the video below to see how RS-Agent automates remote sensing tasks.

BibTeX


    @misc{xu2024rsagentautomatingremotesensing,
    title={RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents}, 
    author={Wenjia Xu and Zijian Yu and Yixu Wang and Jiuniu Wang and Mugen Peng},
    year={2024},
    eprint={2406.07089},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2406.07089}, 
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We are thankful to LLaVA, Qwen, DeepSeek, GeoChat and LHRS-Bot for releasing their models and code as open-source contributions.

IVAL Logo Oryx Logo MBZUAI Logo