Why do we want this? Simple. APIs are not always available and they can be incredibly expensive to use. Agents that can use GUIs with ease have a massive advantage operating mobile phones, desktops and SaaS applications. They can work with it just like a human.GUI navigation makes any program accessible and programmable to an agent, which offers tremendous potential to gather information, automate complex, open ended tasks and control your desktop. Almost all the work in this area is currently focused on helping agents to work in browsers, but many apps aren’t available on the web.That’s why we created AgentDesk. It allows you to run VMs locally and in the cloud, and to control them using a Python SDK and CLI. This gives you a tremendously solid foundation for advanced GUI controlling agents.Check out an example of a complex GUI-based agent here. Read on to learn how to use AgentDesk.
pip install agentdeskIf you run local VMs, you need Docker to run the containers with Desktop GUI.You also need QEMU if you are creating QEMU desktops instead of Docker desktops.
from agentdesk import Desktop# Create a local VMdesktop = Desktop.local()# Launch the UI for itdesktop.view(background=True)# Open a browser to Googledesktop.open_url("https://google.com")# Take actions on the desktopdesktop.move_mouse(500, 500)desktop.click()img = desktop.take_screenshot()