AgentDesk provides full-featured Desktop environments which can be programatically controlled by AI agents.

Features

  • Built on AgentD – a runtime daemon which exposes a REST API for interacting with the desktop.

  • Implements the DeviceBay Protocol.

  • Provides a CLI and a Python library.

  • The Desktops can be run locally or in the cloud.

Motivation

Why do we want this? Simple. APIs are not always available and they can be incredibly expensive to use. Agents that can use GUIs with ease have a massive advantage operating mobile phones, desktops and SaaS applications. They can work with it just like a human.

GUI navigation makes any program accessible and programmable to an agent, which offers tremendous potential to gather information, automate complex, open ended tasks and control your desktop. Almost all the work in this area is currently focused on helping agents to work in browsers, but many apps aren’t available on the web.

That’s why we created AgentDesk. It allows you to run VMs locally and in the cloud, and to control them using a Python SDK and CLI. This gives you a tremendously solid foundation for advanced GUI controlling agents.

Check out an example of a complex GUI-based agent here. Read on to learn how to use AgentDesk.

Installation

pip install agentdesk

If you run local VMs, you need QEMU.

You also need Docker Desktop to run the containers with Desktop GUI.

Quick Start: local run

from agentdesk import Desktop

# Create a local VM
desktop = Desktop.local()

# Launch the UI for it
desktop.view(background=True)

# Open a browser to Google
desktop.open_url("https://google.com")

# Take actions on the desktop
desktop.move_mouse(500, 500)
desktop.click()
img = desktop.take_screenshot()

Running in GCP and in AWS

desktop = Desktop.gce()
desktop = Desktop.aws()

Explore Further