Introduction
MLLM
is a library for simplifying communication with various Large Language Models (LLMs) and multi-modal LLMs such as OpenAI GPT, Anthropic Claude and Google Gemini.
It lets you
- Create a router to communicate with multiple models (LLMs/MLLMs)
- Configure your preference order for the models
- Communicate with the models by sending your message thread to the models’ completion endpoints
- Retry communication upon failure
- Enforce expected response formats
MLLM uses LiteLLM under the hood.
Installation
Basic Example
Provide the API keys for the different models (LLMs / MLLMs) that you would like to use:
Create a Router object that can access all the desired models:
Create a message thread using ThreadMem. MLLM also exposes RoleMessage
and RoleThread
from Threadmem. So, you can also create the thread using MLLM:
Now, use the router to chat with the models using a message thread.
The router will communicate with your most preferred model first. It ensures that less preferred models are used only if the first one fails to respond.
Retry communication with model
The router retires communication upon failure, and you can configure the maximum number of retries allowed.
Enforce response format
To enforce an expected response format, first create a class describing the response format:
Now, use the expect
parameter in the router.chat()
call:
This is handy when you’re working with multiple models and want to ensure consistent outputs.