Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters by shamika · Pull Request #4441 · GoogleCloudPlatform/vertex-ai-samples

shamika · 2026-02-13T00:14:59Z

REQUIRED: Add a summary of your PR here, typically including why the change is needed and what was changed. Include any design alternatives for discussion purposes.

This PR adds a samples demonstrating two deployment approaches for Mistral 7B with dynamically-loaded LoRA adapters on Vertex AI using vLLM, supporting both prebuilt and custom container strategies.

Key Differentiators:

Mistral 7B Instruct v0.3 as the base model
Dynamic LoRA adapter loading from Google Cloud Storage
vLLM serving engine with compatible API
Two deployment methods: Prebuilt container vs. Custom container with Cloud Build

Why This Change is Needed

LoRA (Low-Rank Adaptation) is a popular parameter-efficient fine-tuning technique, but there's currently no example in vertex-ai-samples showing how to
deploy LoRA-adapted models on Vertex AI using vLLM. Existing deployment patterns don't address:

Deploying Mistral 7B (or other large language models) with vLLM on Vertex AI
Loading base models and LoRA adapters from GCS at container startup
Configuring vLLM to serve multiple LoRA adapters alongside a base model
Trade-offs between using prebuilt vLLM containers vs. custom containers with enhanced GCS loading capabilities
End-to-end workflow from model preparation to vLLM endpoint testing

This sample fills that gap by providing complete, reproducible examples for both vLLM deployment strategies.

What Was Changed

Added Notebooks (/prediction/vertexai_serving_vllm/):

vertexai_serving_vllm_mistral_7b_with_lora_adopters_prebuilt_container.ipynb -
Method 1: Prebuilt vLLM Container Approach
- Uses Vertex AI Model Garden's prebuilt vLLM container (pytorch-vllm-serve)
- Deploys Mistral 7B Instruct v0.3 with LoRA adapter via vLLM
- Simplest deployment path with minimal custom code
- LoRA adapters dynamically loaded from GCS paths using vLLM's --lora-modules parameter
- Demonstrates downloading Mistral 7B and adapters from HuggingFace and uploading to GCS
- Shows inference comparison between base Mistral model and LoRA adapter using vLLM's API
vertexai_serving_vllm_mistral_7b_with_lora_adopters_custom_container.ipynb -
Method 2: Custom vLLM Container Approach
- Full control over vLLM container build and model loading logic
- Custom entrypoint with explicit GCS download handling for Mistral 7B and adapters before vLLM startup
- Cloud Build integration for automated container builds
- Supports both base models and LoRA adapters from arbitrary GCS locations
- Demonstrates advanced vLLM configuration options

Added Custom Container Implementation (custom_container/):

Dockerfile - Extends vLLM base image (vllm/vllm-openai) with Google Cloud SDK for GCS access
entrypoint.sh - Smart entrypoint that automatically downloads models (Mistral 7B) and LoRA adapters from GCS paths (e.g., gs://bucket/model) to local
disk at startup, then launches vLLM with support for --lora-modules parameter
cloudbuild.yaml - Automated build configuration supporting both GPU and CPU vLLM variants
ReadMe.md - Architecture documentation explaining the vLLM container design

REQUIRED: Fill out the below checklists or remove if irrelevant

If you are opening a PR for Official Notebooks under the notebooks/official folder, follow this mandatory checklist:

[ X ] Use the notebook template as a starting point.
[ X ] Follow the style and grammar rules outlined in the above notebook template.
[ X ] Verify the notebook runs successfully in Colab since the automated tests cannot guarantee this even when it passes.
[ X ] Passes all the required automated checks. You can locally test for formatting and linting with these instructions.
[ X ] This notebook has been added to the CODEOWNERS file under the Official Notebooks section, pointing to the author or the author's team.
[ X ] The Jupyter notebook cleans up any artifacts it has created (datasets, ML models, endpoints, etc) so as not to eat up unnecessary resources.

If you are opening a PR for Community Notebooks under the notebooks/community folder:

This notebook has been added to the CODEOWNERS file under the Community Notebooks section, pointing to the author or the author's team.
Passes all the required formatting and linting checks. You can locally test with these instructions.

If you are opening a PR for Community Content under the community-content folder:

Make sure your main Content Directory Name is descriptive, informative, and includes some of the key products and attributes of your content, so that it is differentiable from other content
The main content directory has been added to the CODEOWNERS file under the Community Content section, pointing to the author or the author's team.
Passes all the required formatting and linting checks. You can locally test with these instructions.

gemini-code-assist · 2026-02-13T00:15:25Z

Summary of Changes

Hello @shamika, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Vertex AI samples by providing robust and flexible methods for deploying large language models with LoRA adapters. It addresses a critical gap by offering examples for serving Mistral 7B using vLLM, with options for leveraging either prebuilt containers for simplicity or custom containers for advanced control over model loading and environment configuration. This enables users to efficiently deploy and manage LoRA-adapted models, optimizing for both ease of use and customization.

Highlights

New Deployment Samples: Introduced two new sample notebooks for deploying Mistral 7B with dynamic LoRA adapters on Vertex AI using vLLM.
Flexible Deployment Approaches: Demonstrated deployment using both a prebuilt vLLM container for ease of use and a custom vLLM container for advanced control and customization.
Dynamic LoRA Adapter Loading: Implemented dynamic loading of LoRA adapters and base models directly from Google Cloud Storage (GCS) at container startup for the custom container approach.
End-to-End Workflow: Provided a comprehensive end-to-end workflow, covering model preparation, container building (for custom), deployment to Vertex AI endpoints, and inference testing for both methods.

Changelog

notebooks/official/CODEOWNERS
- Added ownership for two new Mistral 7B LoRA notebooks to @shamikaa.
notebooks/official/prediction/vertexai_serving_vllm/custom_container/Dockerfile
- Defined a Dockerfile to build a custom vLLM container image, including Google Cloud SDK for GCS access and a custom entrypoint script.
notebooks/official/prediction/vertexai_serving_vllm/custom_container/ReadMe.md
- Provided an architectural overview of the custom container, detailing the Dockerfile, entrypoint script, and Cloud Build configuration.
notebooks/official/prediction/vertexai_serving_vllm/custom_container/cloudbuild.yaml
- Configured Cloud Build to automate the custom vLLM container image build and push process, supporting both GPU and CPU variants.
notebooks/official/prediction/vertexai_serving_vllm/custom_container/entrypoint.sh
- Implemented a shell script to intelligently intercept --model and --lora-modules arguments, downloading specified models and LoRA adapters from GCS to local disk before launching vLLM.
notebooks/official/prediction/vertexai_serving_vllm/custom_container/local_build.sh
- Added a local script for building the custom Docker container, providing command-line options for device type, base image, repository, project ID, and location.
notebooks/official/prediction/vertexai_serving_vllm/vertexai_serving_vllm_mistral_7b_with_lora_adopters_custom_container.ipynb
- Introduced a Jupyter notebook demonstrating the deployment of Mistral 7B with LoRA adapters on Vertex AI using the newly defined custom vLLM container.
notebooks/official/prediction/vertexai_serving_vllm/vertexai_serving_vllm_mistral_7b_with_lora_adopters_prebuilt_container.ipynb
- Introduced a Jupyter notebook demonstrating the deployment of Mistral 7B with LoRA adapters on Vertex AI using a prebuilt vLLM container from Model Garden.

Activity

Used the official notebook template as a starting point.
Followed style and grammar rules outlined in the notebook template.
Verified that the notebook runs successfully in Colab.
Ensured all required automated checks passed.
Added the new notebooks to the CODEOWNERS file.
Confirmed that the Jupyter notebooks clean up any created artifacts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds valuable examples for deploying Mistral 7B with dynamic LoRA adapters on Vertex AI, covering both prebuilt and custom container strategies. The code is generally well-structured and the notebooks are comprehensive. However, I've identified several critical issues that will prevent the notebooks from executing successfully, including incorrect service account configuration for Cloud Build and an erroneous path construction for the LoRA adapter. I have also pointed out some high-severity issues related to build reproducibility and unsafe default settings for resource cleanup. Additionally, there are some broken links and documentation inconsistencies that should be addressed. My review includes specific suggestions to resolve these problems.

…tainer/local_build.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gericdong · 2026-02-18T15:05:33Z

@shamika thanks for the contribution. Can you please check the failing checks?

shamika · 2026-02-18T20:35:51Z

Working on it. Will send the revised version today

…

On Wed, Feb 18, 2026 at 8:05 AM Eric Dong ***@***.***> wrote: *gericdong* left a comment (GoogleCloudPlatform/vertex-ai-samples#4441) <#4441 (comment)> @shamika <https://github.com/shamika> thanks for the contribution. Can you please check the failing checks? — Reply to this email directly, view it on GitHub <#4441 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABA2TK4LNAQCTUM4LT2EKD4MR5VHAVCNFSM6AAAAACU64A3MGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSMRRGM2DKNZRGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shamika · 2026-02-19T01:15:11Z

Fixed the lint errors. Please let me know if there is anything pending on this PR

gericdong · 2026-02-26T14:01:29Z

@shamika: please fix the ci errors. Thanks.

shamika added 6 commits February 2, 2026 18:33

VLLM deployment with LORA adopters added

ca2a055

VLLM deployment with LORA adopters added

c989979

VLLM deployment with LORA adopters added

e1cc2b4

Service Axxount configuration added.

12cd1c7

Service Account config added

d54a597

Added Code Owner for new notebooks

a88234d

shamika requested a review from a team as a code owner February 13, 2026 00:15

gemini-code-assist Bot reviewed Feb 13, 2026

View reviewed changes

shamika and others added 2 commits February 12, 2026 17:22

Update notebooks/official/prediction/vertexai_serving_vllm/custom_con…

4ea02d3

…tainer/local_build.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Review comments fixed

f8b0784

Linting Errors Fixed

9545b14

shamika added 2 commits February 25, 2026 20:12

Update VLLM_DOCKER_URI to new inference image

2b22077

Fix VLLM_DOCKER_URI assignment in notebook

f7841e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters#4441

Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters#4441
shamika wants to merge 11 commits intoGoogleCloudPlatform:mainfrom
shamika:main

shamika commented Feb 13, 2026

Uh oh!

gemini-code-assist Bot commented Feb 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gericdong commented Feb 18, 2026

Uh oh!

shamika commented Feb 18, 2026 via email

Uh oh!

shamika commented Feb 19, 2026

Uh oh!

gericdong commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shamika commented Feb 13, 2026

Uh oh!

gemini-code-assist Bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gericdong commented Feb 18, 2026

Uh oh!

shamika commented Feb 18, 2026 via email

Uh oh!

shamika commented Feb 19, 2026

Uh oh!

gericdong commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants