AgentD is a powerful daemon designed to make a desktop OS accessible to AI agents. By exposing an HTTP API, AgentD allows for seamless interactions between a desktop environment and AI-driven applications or scripts.

Features

  • Mouse and Keyboard Control: Simulate mouse movements, clicks, and keyboard inputs.
  • Web Browser Control: Open URLs and interact with web content through a Chromium-based browser.
  • Screen Capture: Take screenshots of your desktop for analysis or record-keeping.
  • Session Recording: Record and replay desktop sessions to capture workflows or for debugging purposes.

Getting Started

To get started with AgentD, follow these simple steps:

  1. Installation:

    • For a quick start, we recommend using one of our pre-configured VMs which come with AgentD pre-installed. This is the easiest way to get up and running without worrying about dependencies or configuration.
    • Alternatively, if you prefer to install AgentD on your own Ubuntu VM, you can use our remote installation script. This is suitable for users who want more control over the installation process or need to integrate AgentD into an existing setup.
    • See details in Installation section.
  2. Usage:

    • Once AgentD is installed and the VM is launched, you can start interacting with its desktop through the HTTP API. The API allows you to control the mouse and keyboard, manage web browser sessions, capture screenshots, and much more.
    • To check if AgentD is running correctly, you can send a request to the /health endpoint. A successful response indicates that AgentD is ready to accept commands.
  3. API Endpoints:

    • AgentD provides a rich set of API endpoints to interact with the desktop. Here are some of the key functionalities:
      • Mouse and Keyboard Control: /move_mouse, /click, /type_text, etc.
      • Web Browser Control: /open_url
      • Screen Capture: /screenshot
      • Session Recording: /recordings, /recordings/{session_id}/stop, etc.

For more detailed information on how to use AgentD and its API, please refer to the full API documentation and examples provided in our GitHub repository.