Skip to main content

VertexAI [Anthropic, Gemini, Model Garden]

Overview​

PropertyDetails
DescriptionVertex AI is a fully-managed AI development platform for building and using generative AI.
Provider Route on LiteLLMvertex_ai/
Link to Provider DocVertex AI ↗
Base URLhttps://{vertex_location}-aiplatform.googleapis.com/
Supported Operations/chat/completions, /completions, /embeddings, /audio/speech, /fine_tuning, /batches, /files, /images


Open In Colab

vertex_ai/ route​

The vertex_ai/ route uses uses VertexAI's REST API.

from litellm import completion
import json

## GET CREDENTIALS
## RUN ##
# !gcloud auth application-default login - run this to add vertex credentials to your env
## OR ##
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)

## COMPLETION CALL
response = completion(
model="vertex_ai/gemini-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json
)

System Message​

from litellm import completion
import json

## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


response = completion(
model="vertex_ai/gemini-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json
)

Function Calling​

Force Gemini to make tool calls with tool_choice="required".

from litellm import completion
import json

## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


messages = [
{
"role": "system",
"content": "Your name is Litellm Bot, you are a helpful assistant",
},
# User asks for their name and weather in San Francisco
{
"role": "user",
"content": "Hello, what is your name and can you tell me the weather?",
},
]

tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
},
}
]

data = {
"model": "vertex_ai/gemini-1.5-pro-preview-0514"),
"messages": messages,
"tools": tools,
"tool_choice": "required",
"vertex_credentials": vertex_credentials_json
}

## COMPLETION CALL
print(completion(**data))

JSON Schema​

From v1.40.1+ LiteLLM supports sending response_schema as a param for Gemini-1.5-Pro on Vertex AI. For other models (e.g. gemini-1.5-flash or claude-3-5-sonnet), LiteLLM adds the schema to the message list with a user-controlled prompt.

Response Schema

from litellm import completion 
import json

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]

response_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}


completion(
model="vertex_ai/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Validate Schema

To validate the response_schema, set enforce_validation: true.

from litellm import completion, JSONSchemaValidationError
try:
completion(
model="vertex_ai/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e

LiteLLM will validate the response against the schema, and raise a JSONSchemaValidationError if the response does not match the schema.

JSONSchemaValidationError inherits from openai.APIError

Access the raw response with e.raw_response

Add to prompt yourself

from litellm import completion 

## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)

messages = [
{
"role": "user",
"content": """
List 5 popular cookie recipes.

Using this JSON schema:

Recipe = {"recipe_name": str}

Return a `list[Recipe]`
"""
}
]

completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })

Grounding​

Add Google Search Result grounding to vertex ai calls.

Relevant VertexAI Docs

See the grounding metadata with response_obj._hidden_params["vertex_ai_grounding_metadata"]

from litellm import completion 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"googleSearchRetrieval": {}}] # 👈 ADD GOOGLE SEARCH

resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
)

print(resp)

Moving from Vertex AI SDK to LiteLLM (GROUNDING)​

If this was your initial VertexAI Grounding code,

import vertexai 

vertexai.init(project=project_id, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-001")

# Use Google Search for grounding
tool = Tool.from_google_search_retrieval(grounding.GoogleSearchRetrieval(disable_attributon=False))

prompt = "When is the next total solar eclipse in US?"
response = model.generate_content(
prompt,
tools=[tool],
generation_config=GenerationConfig(
temperature=0.0,
),
)

print(response)

then, this is what it looks like now

from litellm import completion 


# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"googleSearchRetrieval": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH

resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
vertex_project="project-id"
)

print(resp)

Context Caching​

Use Vertex AI context caching is supported by calling provider api directly. (Unified Endpoint support comin soon.).

Go straight to provider

Pre-requisites​

  • pip install google-cloud-aiplatform (pre-installed on proxy docker image)

  • Authentication:

    • run gcloud auth application-default login See Google Cloud Docs

    • Alternatively you can set GOOGLE_APPLICATION_CREDENTIALS

      Here's how: Jump to Code

      • Create a service account on GCP
      • Export the credentials as a json
      • load the json and json.dump the json as a string
      • store the json string in your environment as GOOGLE_APPLICATION_CREDENTIALS

Sample Usage​

import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location

response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])

Usage with LiteLLM Proxy Server​

Here's how to use Vertex AI with the LiteLLM Proxy Server

  1. Modify the config.yaml

    Use this when you need to set a different location for each vertex model

    model_list:
    - model_name: gemini-vision
    litellm_params:
    model: vertex_ai/gemini-1.0-pro-vision-001
    vertex_project: "project-id"
    vertex_location: "us-central1"
    - model_name: gemini-vision
    litellm_params:
    model: vertex_ai/gemini-1.0-pro-vision-001
    vertex_project: "project-id2"
    vertex_location: "us-east"
  2. Start the proxy

    $ litellm --config /path/to/config.yaml
  3. Send Request to LiteLLM Proxy Server

    import openai
    client = openai.OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
    )

    response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
    {
    "role": "user",
    "content": "what llm are you"
    }
    ],
    )

    print(response)

Authentication - vertex_project, vertex_location, etc.​

Set your vertex credentials via:

  • dynamic params OR
  • env vars

Dynamic Params​

You can set:

  • vertex_credentials (str) - can be a json string or filepath to your vertex ai service account.json
  • vertex_location (str) - place where vertex model is deployed (us-central1, asia-southeast1, etc.)
  • vertex_project Optional[str] - use if vertex project different from the one in vertex_credentials

as dynamic params for a litellm.completion call.

from litellm import completion
import json

## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


response = completion(
model="vertex_ai/gemini-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json,
vertex_project="my-special-project",
vertex_location="my-special-location"
)

Environment Variables​

You can set:

  • GOOGLE_APPLICATION_CREDENTIALS - store the filepath for your service_account.json in here (used by vertex sdk directly).
  • VERTEXAI_LOCATION - place where vertex model is deployed (us-central1, asia-southeast1, etc.)
  • VERTEXAI_PROJECT - Optional[str] - use if vertex project different from the one in vertex_credentials
  1. GOOGLE_APPLICATION_CREDENTIALS
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json"
  1. VERTEXAI_LOCATION
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
  1. VERTEXAI_PROJECT
export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is different from service account project

Specifying Safety Settings​

In certain use-cases you may need to make calls to the models and pass safety settigns different from the defaults. To do so, simple pass the safety_settings argument to completion or acompletion. For example:

Set per model/request​

response = completion(
model="vertex_ai/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
safety_settings=[
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
)

Set Globally​

import litellm 

litellm.set_verbose = True 👈 See RAW REQUEST/RESPONSE

litellm.vertex_ai_safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

Set Vertex Project & Vertex Location​

All calls using Vertex AI require the following parameters:

  • Your Project ID
import os, litellm 

# set via env var
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811" # Your Project ID`

### OR ###

# set directly on module
litellm.vertex_project = "hardy-device-38811" # Your Project ID`
  • Your Project Location
import os, litellm 

# set via env var
os.environ["VERTEXAI_LOCATION"] = "us-central1 # Your Location

### OR ###

# set directly on module
litellm.vertex_location = "us-central1 # Your Location

Anthropic​

Model NameFunction Call
claude-3-opus@20240229completion('vertex_ai/claude-3-opus@20240229', messages)
claude-3-5-sonnet@20240620completion('vertex_ai/claude-3-5-sonnet@20240620', messages)
claude-3-sonnet@20240229completion('vertex_ai/claude-3-sonnet@20240229', messages)
claude-3-haiku@20240307completion('vertex_ai/claude-3-haiku@20240307', messages)

Usage​

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "claude-3-sonnet@20240229"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
temperature=0.7,
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

Llama 3 API​

Model NameFunction Call
meta/llama3-405b-instruct-maascompletion('vertex_ai/meta/llama3-405b-instruct-maas', messages)

Usage​

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "meta/llama3-405b-instruct-maas"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

Mistral API​

Supported OpenAI Params

Model NameFunction Call
mistral-large@latestcompletion('vertex_ai/mistral-large@latest', messages)
mistral-large@2407completion('vertex_ai/mistral-large@2407', messages)
mistral-nemo@latestcompletion('vertex_ai/mistral-nemo@latest', messages)
codestral@latestcompletion('vertex_ai/codestral@latest', messages)
codestral@@2405completion('vertex_ai/codestral@2405', messages)

Usage​

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "mistral-large@2407"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

Usage - Codestral FIM​

Call Codestral on VertexAI via the OpenAI /v1/completion endpoint for FIM tasks.

Note: You can also call Codestral via /chat/completion.

from litellm import completion
import os

# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
# OR run `!gcloud auth print-access-token` in your terminal

model = "codestral@2405"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = text_completion(
model="vertex_ai/" + model,
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
prompt="def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():",
suffix="return True", # optional
temperature=0, # optional
top_p=1, # optional
max_tokens=10, # optional
min_tokens=10, # optional
seed=10, # optional
stop=["return"], # optional
)

print("\nModel Response", response)

AI21 Models​

Model NameFunction Call
jamba-1.5-mini@001completion(model='vertex_ai/jamba-1.5-mini@001', messages)
jamba-1.5-large@001completion(model='vertex_ai/jamba-1.5-large@001', messages)

Usage​

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "meta/jamba-1.5-mini@001"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

Model Garden​

tip

All OpenAI compatible models from Vertex Model Garden are supported.

Using Model Garden​

Almost all Vertex Model Garden models are OpenAI compatible.

PropertyDetails
Provider Routevertex_ai/openai/{MODEL_ID}
Vertex DocumentationVertex Model Garden - OpenAI Chat Completions, Vertex Model Garden
Supported Operations/chat/completions, /embeddings
from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
model="vertex_ai/openai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemini Pro​

Model NameFunction Call
gemini-procompletion('gemini-pro', messages), completion('vertex_ai/gemini-pro', messages)

Fine-tuned Models​

Fine tuned models on vertex have a numerical model/endpoint id.

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
model="vertex_ai/<your-finetuned-model>", # e.g. vertex_ai/4965075652664360960
messages=[{ "content": "Hello, how are you?","role": "user"}],
base_model="vertex_ai/gemini-1.5-pro" # the base model - used for routing
)

Gemini Pro Vision​

Model NameFunction Call
gemini-pro-visioncompletion('gemini-pro-vision', messages), completion('vertex_ai/gemini-pro-vision', messages)

Gemini 1.5 Pro (and Vision)​

Model NameFunction Call
gemini-1.5-procompletion('gemini-1.5-pro', messages), completion('vertex_ai/gemini-1.5-pro', messages)
gemini-1.5-flash-preview-0514completion('gemini-1.5-flash-preview-0514', messages), completion('vertex_ai/gemini-1.5-flash-preview-0514', messages)
gemini-1.5-pro-preview-0514completion('gemini-1.5-pro-preview-0514', messages), completion('vertex_ai/gemini-1.5-pro-preview-0514', messages)

Using Gemini Pro Vision​

Call gemini-pro-vision in the same input/output format as OpenAI gpt-4-vision

LiteLLM Supports the following image types passed in url

Example Request - image url

import litellm

response = litellm.completion(
model = "vertex_ai/gemini-pro-vision",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
)
print(response)

Usage - Function Calling​

LiteLLM supports Function Calling for Vertex AI gemini models.

from litellm import completion
import os
# set env
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ".."
os.environ["VERTEX_AI_PROJECT"] = ".."
os.environ["VERTEX_AI_LOCATION"] = ".."

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
model="vertex_ai/gemini-pro-vision",
messages=messages,
tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
response.choices[0].message.tool_calls[0].function.arguments, str
)

Usage - PDF / Videos / etc. Files​

Pass any file supported by Vertex AI, through LiteLLM.

Using gs://​

from litellm import completion

response = completion(
model="vertex_ai/gemini-1.5-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
{
"type": "image_url",
"image_url": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf", # 👈 PDF
},
],
}
],
max_tokens=300,
)

print(response.choices[0])

using base64​

from litellm import completion
import base64
import requests

# URL of the file
url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"

# Download the file
response = requests.get(url)
file_data = response.content

encoded_file = base64.b64encode(file_data).decode("utf-8")

response = completion(
model="vertex_ai/gemini-1.5-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
{
"type": "image_url",
"image_url": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
},
],
}
],
max_tokens=300,
)

print(response.choices[0])

Chat Models​

Model NameFunction Call
chat-bison-32kcompletion('chat-bison-32k', messages)
chat-bisoncompletion('chat-bison', messages)
chat-bison@001completion('chat-bison@001', messages)

Code Chat Models​

Model NameFunction Call
codechat-bisoncompletion('codechat-bison', messages)
codechat-bison-32kcompletion('codechat-bison-32k', messages)
codechat-bison@001completion('codechat-bison@001', messages)

Text Models​

Model NameFunction Call
text-bisoncompletion('text-bison', messages)
text-bison@001completion('text-bison@001', messages)

Code Text Models​

Model NameFunction Call
code-bisoncompletion('code-bison', messages)
code-bison@001completion('code-bison@001', messages)
code-gecko@001completion('code-gecko@001', messages)
code-gecko@latestcompletion('code-gecko@latest', messages)

Embedding Models​

Usage - Embedding​

import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location

response = embedding(
model="vertex_ai/textembedding-gecko",
input=["good morning from litellm"],
)
print(response)

Supported Embedding Models​

All models listed here are supported

Model NameFunction Call
text-embedding-004embedding(model="vertex_ai/text-embedding-004", input)
text-multilingual-embedding-002embedding(model="vertex_ai/text-multilingual-embedding-002", input)
textembedding-geckoembedding(model="vertex_ai/textembedding-gecko", input)
textembedding-gecko-multilingualembedding(model="vertex_ai/textembedding-gecko-multilingual", input)
textembedding-gecko-multilingual@001embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input)
textembedding-gecko@001embedding(model="vertex_ai/textembedding-gecko@001", input)
textembedding-gecko@003embedding(model="vertex_ai/textembedding-gecko@003", input)
text-embedding-preview-0409embedding(model="vertex_ai/text-embedding-preview-0409", input)
text-multilingual-embedding-preview-0409embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)
Fine-tuned OR Custom Embedding modelsembedding(model="vertex_ai/<your-model-id>", input)

Supported OpenAI (Unified) Params​

paramtypevertex equivalent
inputstring or List[string]instances
dimensionsintoutput_dimensionality
input_typeLiteral["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"]task_type

Usage with OpenAI (Unified) Params​

response = litellm.embedding(
model="vertex_ai/text-embedding-004",
input=["good morning from litellm", "gm"]
input_type = "RETRIEVAL_DOCUMENT",
dimensions=1,
)

Supported Vertex Specific Params​

paramtype
auto_truncatebool
task_typeLiteral["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"]
titlestr

Usage with Vertex Specific Params (Use task_type and title)​

You can pass any vertex specific params to the embedding model. Just pass them to the embedding function like this:

Relevant Vertex AI doc with all embedding params

response = litellm.embedding(
model="vertex_ai/text-embedding-004",
input=["good morning from litellm", "gm"]
task_type = "RETRIEVAL_DOCUMENT",
title = "test",
dimensions=1,
auto_truncate=True,
)

Multi-Modal Embeddings​

Usage

Using GCS Images

response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input="gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png" # will be sent as a gcs image
)

Using base 64 encoded images

response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input="data:image/jpeg;base64,..." # will be sent as a base64 encoded image
)

Image Generation Models​

Usage

response = await litellm.aimage_generation(
prompt="An olympic size swimming pool",
model="vertex_ai/imagegeneration@006",
vertex_ai_project="adroit-crow-413218",
vertex_ai_location="us-central1",
)

Generating multiple images

Use the n parameter to pass how many images you want generated

response = await litellm.aimage_generation(
prompt="An olympic size swimming pool",
model="vertex_ai/imagegeneration@006",
vertex_ai_project="adroit-crow-413218",
vertex_ai_location="us-central1",
n=1,
)

Supported Image Generation Models​

Model NameFUsage
imagen-3.0-generate-001litellm.image_generation('vertex_ai/imagen-3.0-generate-001', prompt)
imagen-3.0-fast-generate-001litellm.image_generation('vertex_ai/imagen-3.0-fast-generate-001', prompt)
imagegeneration@006litellm.image_generation('vertex_ai/imagegeneration@006', prompt)
imagegeneration@005litellm.image_generation('vertex_ai/imagegeneration@005', prompt)
imagegeneration@002litellm.image_generation('vertex_ai/imagegeneration@002', prompt)

Text to Speech APIs​

info

LiteLLM supports calling Vertex AI Text to Speech API in the OpenAI text to speech API format

Usage - Basic​

Vertex AI does not support passing a model param - so passing model=vertex_ai/ is the only required param

Sync Usage

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.speech(
model="vertex_ai/",
input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)

Async Usage

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.aspeech(
model="vertex_ai/",
input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)

Usage - ssml as input​

Pass your ssml as input to the input param, if it contains <speak>, it will be automatically detected and passed as ssml to the Vertex AI API

If you need to force your input to be passed as ssml, set use_ssml=True

Vertex AI does not support passing a model param - so passing model=vertex_ai/ is the only required param

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"


ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = litellm.speech(
input=ssml,
model="vertex_ai/test",
voice={
"languageCode": "en-UK",
"name": "en-UK-Studio-O",
},
audioConfig={
"audioEncoding": "LINEAR22",
"speakingRate": "10",
},
)
response.stream_to_file(speech_file_path)

Forcing SSML Usage​

You can force the use of SSML by setting the use_ssml parameter to True. This is useful when you want to ensure that your input is treated as SSML, even if it doesn't contain the <speak> tags.

Here are examples of how to force SSML usage:

Vertex AI does not support passing a model param - so passing model=vertex_ai/ is the only required param

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"


ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = litellm.speech(
input=ssml,
use_ssml=True,
model="vertex_ai/test",
voice={
"languageCode": "en-UK",
"name": "en-UK-Studio-O",
},
audioConfig={
"audioEncoding": "LINEAR22",
"speakingRate": "10",
},
)
response.stream_to_file(speech_file_path)

Batch APIs​

Just add the following Vertex env vars to your environment.

# GCS Bucket settings, used to store batch prediction files in
export GCS_BUCKET_NAME = "litellm-testing-bucket" # the bucket you want to store batch prediction files in
export GCS_PATH_SERVICE_ACCOUNT="/path/to/service_account.json" # path to your service account json file

# Vertex /batch endpoint settings, used for LLM API requests
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json" # path to your service account json file
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
export VERTEXAI_PROJECT="my-test-project"

Usage​

1. Create a file of batch requests for vertex​

LiteLLM expects the file to follow the OpenAI batches files format

Each body in the file should be an OpenAI API request

Create a file called vertex_batch_completions.jsonl in the current working directory, the model should be the Vertex AI model name

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}

2. Upload a File of batch requests​

For vertex_ai litellm will upload the file to the provided GCS_BUCKET_NAME

import os
oai_client = OpenAI(
api_key="sk-1234", # litellm proxy API key
base_url="http://localhost:4000" # litellm proxy base url
)
file_name = "vertex_batch_completions.jsonl" #
_current_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(_current_dir, file_name)
file_obj = oai_client.files.create(
file=open(file_path, "rb"),
purpose="batch",
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use vertex_ai for this file upload
)

Expected Response

{
"id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
"bytes": 416,
"created_at": 1733392026,
"filename": "litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
"object": "file",
"purpose": "batch",
"status": "uploaded",
"status_details": null
}

3. Create a batch​

batch_input_file_id = file_obj.id # use `file_obj` from step 2
create_batch_response = oai_client.batches.create(
completion_window="24h",
endpoint="/v1/chat/completions",
input_file_id=batch_input_file_id, # example input_file_id = "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/c2b1b785-252b-448c-b180-033c4c63b3ce"
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)

Expected Response

{
"id": "3814889423749775360",
"completion_window": "24hrs",
"created_at": 1733392026,
"endpoint": "",
"input_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
"object": "batch",
"status": "validating",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": null,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": null,
"output_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
"request_counts": null
}

4. Retrieve a batch​

retrieved_batch = oai_client.batches.retrieve(
batch_id=create_batch_response.id,
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)

Expected Response

{
"id": "3814889423749775360",
"completion_window": "24hrs",
"created_at": 1736500100,
"endpoint": "",
"input_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/7b2e47f5-3dd4-436d-920f-f9155bbdc952",
"object": "batch",
"status": "completed",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": null,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": null,
"output_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
"request_counts": null
}

Fine Tuning APIs​

PropertyDetails
DescriptionCreate Fine Tuning Jobs in Vertex AI (/tuningJobs) using OpenAI Python SDK
Vertex Fine Tuning DocumentationVertex Fine Tuning

Usage​

1. Add finetune_settings to your config.yaml​

model_list:
- model_name: gpt-4
litellm_params:
model: openai/fake
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/

# 👇 Key change: For /fine_tuning/jobs endpoints
finetune_settings:
- custom_llm_provider: "vertex_ai"
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: "/Users/ishaanjaffer/Downloads/adroit-crow-413218-a956eef1a2a8.json"

2. Create a Fine Tuning Job​

ft_job = await client.fine_tuning.jobs.create(
model="gemini-1.0-pro-002", # Vertex model you want to fine-tune
training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl", # file_id from create file response
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm proxy which provider to use
)

Advanced use case - Passing adapter_size to the Vertex AI API

Set hyper_parameters, such as n_epochs, learning_rate_multiplier and adapter_size. See Vertex Advanced Hyperparameters


ft_job = client.fine_tuning.jobs.create(
model="gemini-1.0-pro-002", # Vertex model you want to fine-tune
training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl", # file_id from create file response
hyperparameters={
"n_epochs": 3, # epoch_count on Vertex
"learning_rate_multiplier": 0.1, # learning_rate_multiplier on Vertex
"adapter_size": "ADAPTER_SIZE_ONE" # type: ignore, vertex specific hyperparameter
},
extra_body={
"custom_llm_provider": "vertex_ai",
},
)

Extra​

Using GOOGLE_APPLICATION_CREDENTIALS​

Here's the code for storing your service account credentials as GOOGLE_APPLICATION_CREDENTIALS environment variable:

import os 
import tempfile

def load_vertex_ai_credentials():
# Define the path to the vertex_key.json file
print("loading vertex ai credentials")
filepath = os.path.dirname(os.path.abspath(__file__))
vertex_key_path = filepath + "/vertex_key.json"

# Read the existing content of the file or create an empty dictionary
try:
with open(vertex_key_path, "r") as file:
# Read the file content
print("Read vertexai file path")
content = file.read()

# If the file is empty or not valid JSON, create an empty dictionary
if not content or not content.strip():
service_account_key_data = {}
else:
# Attempt to load the existing JSON content
file.seek(0)
service_account_key_data = json.load(file)
except FileNotFoundError:
# If the file doesn't exist, create an empty dictionary
service_account_key_data = {}

# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w+", delete=False) as temp_file:
# Write the updated content to the temporary file
json.dump(service_account_key_data, temp_file, indent=2)

# Export the temporary file as GOOGLE_APPLICATION_CREDENTIALS
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath(temp_file.name)

Using GCP Service Account​

info

Trying to deploy LiteLLM on Google Cloud Run? Tutorial here

  1. Figure out the Service Account bound to the Google Cloud Run service
  1. Get the FULL EMAIL address of the corresponding Service Account

  2. Next, go to IAM & Admin > Manage Resources , select your top-level project that houses your Google Cloud Run Service

Click Add Principal

  1. Specify the Service Account as the principal and Vertex AI User as the role

Once that's done, when you deploy the new container in the Google Cloud Run service, LiteLLM will have automatic access to all Vertex AI endpoints.

s/o @Darien Kindlund for this tutorial