RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

🏆 Contributions

We present RS-Agent, a novel architecture designed to interpret user queries and orchestrate diverse tools for accurate and efficient remote sensing task execution. Its four core components—Central Controller, Toolkits, Solution Space, and Knowledge Space—work in concert, seamlessly interacting and complementing one another to enable robust, adaptive performance across a wide range of applications.

To enhance the agent’s task planning accuracy, we propose an innovative Task-Aware Retrieval method. By retrieving and understanding expert-level task solutions, RS-Agent is able to emulate the decision-making and tool selection processes of professional remote sensing analysts.

To strengthen RS-Agent’s domain-specific knowledge, we propose DualRAG, a retrieval augmented generation method that assigns weights to extracted keywords and performs dual path retrieval, thereby enhancing the accuracy and relevance of knowledge retrieval.

Extensive experiments demonstrate that RS-Agent consistently surpasses previous SOTA Multimodal Large Language Models across a range of remote sensing applications, and significantly boosts the task planning accuracy. These results establish RS-Agent as a major step forward in adapting AI agents to the remote sensing field, and, for the first time, present a comprehensive and modular architecture tailored for remote sensing applications.

The schematic diagram of the RS-Agent.

RS-Agent employs an LLM to understand the user's requirements.

RS-Agent can utilize multiple tools and engage in multi-turn conversations.

RS-Agent is capable of answering questions in specialized fields.

The RS-Agent integrates existing high-performance remote sensing tools. It can understand user intentions like a Central Controller, and solve user needs through planning, reasoning and action. It is also Capable of handling professional and technical knowledge in remote sensing.

RS-Agent: Architecture

When ${M}_{c}$ receives query $Q$ and image $I$, ${M}_{c}$ will transmit the solution requirement ${r}_{s}$ to ${M}_{s}$ . ${M}_{s}$ employs the FIASS algorithm to derive solution guidance ${g}_{s}$ , which assists ${M}_{c}$ in selecting the appropriate tools $\hat{T}$ after dispatching the tool requirement ${r}_{t}$ to the tool space $T$. If ${M}_{c}$ requires additional knowledge guidance ${g}_{k}$, ${M}_{k}$ will provide it from ${D}_{k}$ according to the knowledge requirement ${k}_{s}$. ${M}_{c}$ will then invoke $\hat{T}$ and produce the final answer $A$ along with the processed image $\hat{I}$ .

Demo Video

Here is a demonstration of the RS-Agent in action.

Watch the video below to see how RS-Agent automates remote sensing tasks.

BibTeX


    @misc{xu2024rsagentautomatingremotesensing,
    title={RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents}, 
    author={Wenjia Xu and Zijian Yu and Yixu Wang and Jiuniu Wang and Mugen Peng},
    year={2024},
    eprint={2406.07089},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2406.07089}, 
}

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We are thankful to LLaVA, Qwen, DeepSeek, GeoChat and LHRS-Bot for releasing their models and code as open-source contributions.