Skip to content

Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters#4441

Open
shamika wants to merge 11 commits intoGoogleCloudPlatform:mainfrom
shamika:main
Open

Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters#4441
shamika wants to merge 11 commits intoGoogleCloudPlatform:mainfrom
shamika:main

Conversation

@shamika
Copy link
Copy Markdown

@shamika shamika commented Feb 13, 2026

REQUIRED: Add a summary of your PR here, typically including why the change is needed and what was changed. Include any design alternatives for discussion purposes.


This PR adds a samples demonstrating two deployment approaches for Mistral 7B with dynamically-loaded LoRA adapters on Vertex AI using vLLM, supporting both prebuilt and custom container strategies.

Key Differentiators:

  • Mistral 7B Instruct v0.3 as the base model
  • Dynamic LoRA adapter loading from Google Cloud Storage
  • vLLM serving engine with compatible API
  • Two deployment methods: Prebuilt container vs. Custom container with Cloud Build

Why This Change is Needed

LoRA (Low-Rank Adaptation) is a popular parameter-efficient fine-tuning technique, but there's currently no example in vertex-ai-samples showing how to
deploy LoRA-adapted models on Vertex AI using vLLM. Existing deployment patterns don't address:

  • Deploying Mistral 7B (or other large language models) with vLLM on Vertex AI
  • Loading base models and LoRA adapters from GCS at container startup
  • Configuring vLLM to serve multiple LoRA adapters alongside a base model
  • Trade-offs between using prebuilt vLLM containers vs. custom containers with enhanced GCS loading capabilities
  • End-to-end workflow from model preparation to vLLM endpoint testing

This sample fills that gap by providing complete, reproducible examples for both vLLM deployment strategies.

What Was Changed

Added Notebooks (/prediction/vertexai_serving_vllm/):

  1. vertexai_serving_vllm_mistral_7b_with_lora_adopters_prebuilt_container.ipynb -
    Method 1: Prebuilt vLLM Container Approach
    - Uses Vertex AI Model Garden's prebuilt vLLM container (pytorch-vllm-serve)
    - Deploys Mistral 7B Instruct v0.3 with LoRA adapter via vLLM
    - Simplest deployment path with minimal custom code
    - LoRA adapters dynamically loaded from GCS paths using vLLM's --lora-modules parameter
    - Demonstrates downloading Mistral 7B and adapters from HuggingFace and uploading to GCS
    - Shows inference comparison between base Mistral model and LoRA adapter using vLLM's API
  2. vertexai_serving_vllm_mistral_7b_with_lora_adopters_custom_container.ipynb -
    Method 2: Custom vLLM Container Approach
    - Full control over vLLM container build and model loading logic
    - Custom entrypoint with explicit GCS download handling for Mistral 7B and adapters before vLLM startup
    - Cloud Build integration for automated container builds
    - Supports both base models and LoRA adapters from arbitrary GCS locations
    - Demonstrates advanced vLLM configuration options

Added Custom Container Implementation (custom_container/):

  • Dockerfile - Extends vLLM base image (vllm/vllm-openai) with Google Cloud SDK for GCS access
  • entrypoint.sh - Smart entrypoint that automatically downloads models (Mistral 7B) and LoRA adapters from GCS paths (e.g., gs://bucket/model) to local
    disk at startup, then launches vLLM with support for --lora-modules parameter
  • cloudbuild.yaml - Automated build configuration supporting both GPU and CPU vLLM variants
  • ReadMe.md - Architecture documentation explaining the vLLM container design



REQUIRED: Fill out the below checklists or remove if irrelevant

  1. If you are opening a PR for Official Notebooks under the notebooks/official folder, follow this mandatory checklist:
  • [ X ] Use the notebook template as a starting point.
  • [ X ] Follow the style and grammar rules outlined in the above notebook template.
  • [ X ] Verify the notebook runs successfully in Colab since the automated tests cannot guarantee this even when it passes.
  • [ X ] Passes all the required automated checks. You can locally test for formatting and linting with these instructions.
  • [ X ] This notebook has been added to the CODEOWNERS file under the Official Notebooks section, pointing to the author or the author's team.
  • [ X ] The Jupyter notebook cleans up any artifacts it has created (datasets, ML models, endpoints, etc) so as not to eat up unnecessary resources.

  1. If you are opening a PR for Community Notebooks under the notebooks/community folder:
  • This notebook has been added to the CODEOWNERS file under the Community Notebooks section, pointing to the author or the author's team.
  • Passes all the required formatting and linting checks. You can locally test with these instructions.

  1. If you are opening a PR for Community Content under the community-content folder:
  • Make sure your main Content Directory Name is descriptive, informative, and includes some of the key products and attributes of your content, so that it is differentiable from other content
  • The main content directory has been added to the CODEOWNERS file under the Community Content section, pointing to the author or the author's team.
  • Passes all the required formatting and linting checks. You can locally test with these instructions.

@shamika shamika requested a review from a team as a code owner February 13, 2026 00:15
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @shamika, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Vertex AI samples by providing robust and flexible methods for deploying large language models with LoRA adapters. It addresses a critical gap by offering examples for serving Mistral 7B using vLLM, with options for leveraging either prebuilt containers for simplicity or custom containers for advanced control over model loading and environment configuration. This enables users to efficiently deploy and manage LoRA-adapted models, optimizing for both ease of use and customization.

Highlights

  • New Deployment Samples: Introduced two new sample notebooks for deploying Mistral 7B with dynamic LoRA adapters on Vertex AI using vLLM.
  • Flexible Deployment Approaches: Demonstrated deployment using both a prebuilt vLLM container for ease of use and a custom vLLM container for advanced control and customization.
  • Dynamic LoRA Adapter Loading: Implemented dynamic loading of LoRA adapters and base models directly from Google Cloud Storage (GCS) at container startup for the custom container approach.
  • End-to-End Workflow: Provided a comprehensive end-to-end workflow, covering model preparation, container building (for custom), deployment to Vertex AI endpoints, and inference testing for both methods.
Changelog
  • notebooks/official/CODEOWNERS
    • Added ownership for two new Mistral 7B LoRA notebooks to @shamikaa.
  • notebooks/official/prediction/vertexai_serving_vllm/custom_container/Dockerfile
    • Defined a Dockerfile to build a custom vLLM container image, including Google Cloud SDK for GCS access and a custom entrypoint script.
  • notebooks/official/prediction/vertexai_serving_vllm/custom_container/ReadMe.md
    • Provided an architectural overview of the custom container, detailing the Dockerfile, entrypoint script, and Cloud Build configuration.
  • notebooks/official/prediction/vertexai_serving_vllm/custom_container/cloudbuild.yaml
    • Configured Cloud Build to automate the custom vLLM container image build and push process, supporting both GPU and CPU variants.
  • notebooks/official/prediction/vertexai_serving_vllm/custom_container/entrypoint.sh
    • Implemented a shell script to intelligently intercept --model and --lora-modules arguments, downloading specified models and LoRA adapters from GCS to local disk before launching vLLM.
  • notebooks/official/prediction/vertexai_serving_vllm/custom_container/local_build.sh
    • Added a local script for building the custom Docker container, providing command-line options for device type, base image, repository, project ID, and location.
  • notebooks/official/prediction/vertexai_serving_vllm/vertexai_serving_vllm_mistral_7b_with_lora_adopters_custom_container.ipynb
    • Introduced a Jupyter notebook demonstrating the deployment of Mistral 7B with LoRA adapters on Vertex AI using the newly defined custom vLLM container.
  • notebooks/official/prediction/vertexai_serving_vllm/vertexai_serving_vllm_mistral_7b_with_lora_adopters_prebuilt_container.ipynb
    • Introduced a Jupyter notebook demonstrating the deployment of Mistral 7B with LoRA adapters on Vertex AI using a prebuilt vLLM container from Model Garden.
Activity
  • Used the official notebook template as a starting point.
  • Followed style and grammar rules outlined in the notebook template.
  • Verified that the notebook runs successfully in Colab.
  • Ensured all required automated checks passed.
  • Added the new notebooks to the CODEOWNERS file.
  • Confirmed that the Jupyter notebooks clean up any created artifacts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable examples for deploying Mistral 7B with dynamic LoRA adapters on Vertex AI, covering both prebuilt and custom container strategies. The code is generally well-structured and the notebooks are comprehensive. However, I've identified several critical issues that will prevent the notebooks from executing successfully, including incorrect service account configuration for Cloud Build and an erroneous path construction for the LoRA adapter. I have also pointed out some high-severity issues related to build reproducibility and unsafe default settings for resource cleanup. Additionally, there are some broken links and documentation inconsistencies that should be addressed. My review includes specific suggestions to resolve these problems.

Comment thread notebooks/official/prediction/vertexai_serving_vllm/custom_container/ReadMe.md Outdated
shamika and others added 2 commits February 12, 2026 17:22
…tainer/local_build.sh

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@gericdong
Copy link
Copy Markdown
Contributor

@shamika thanks for the contribution. Can you please check the failing checks?

@shamika
Copy link
Copy Markdown
Author

shamika commented Feb 18, 2026 via email

@shamika
Copy link
Copy Markdown
Author

shamika commented Feb 19, 2026

Fixed the lint errors. Please let me know if there is anything pending on this PR

@gericdong
Copy link
Copy Markdown
Contributor

@shamika: please fix the ci errors. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants