Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters#4441
Mistral_7B_Vertex AI Inference_with_dynamic_LORA_adopters#4441shamika wants to merge 11 commits intoGoogleCloudPlatform:mainfrom
Conversation
Summary of ChangesHello @shamika, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Vertex AI samples by providing robust and flexible methods for deploying large language models with LoRA adapters. It addresses a critical gap by offering examples for serving Mistral 7B using vLLM, with options for leveraging either prebuilt containers for simplicity or custom containers for advanced control over model loading and environment configuration. This enables users to efficiently deploy and manage LoRA-adapted models, optimizing for both ease of use and customization. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds valuable examples for deploying Mistral 7B with dynamic LoRA adapters on Vertex AI, covering both prebuilt and custom container strategies. The code is generally well-structured and the notebooks are comprehensive. However, I've identified several critical issues that will prevent the notebooks from executing successfully, including incorrect service account configuration for Cloud Build and an erroneous path construction for the LoRA adapter. I have also pointed out some high-severity issues related to build reproducibility and unsafe default settings for resource cleanup. Additionally, there are some broken links and documentation inconsistencies that should be addressed. My review includes specific suggestions to resolve these problems.
…tainer/local_build.sh Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@shamika thanks for the contribution. Can you please check the failing checks? |
|
Working on it. Will send the revised version today
…On Wed, Feb 18, 2026 at 8:05 AM Eric Dong ***@***.***> wrote:
*gericdong* left a comment (GoogleCloudPlatform/vertex-ai-samples#4441)
<#4441 (comment)>
@shamika <https://github.com/shamika> thanks for the contribution. Can
you please check the failing checks?
—
Reply to this email directly, view it on GitHub
<#4441 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABA2TK4LNAQCTUM4LT2EKD4MR5VHAVCNFSM6AAAAACU64A3MGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSMRRGM2DKNZRGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Fixed the lint errors. Please let me know if there is anything pending on this PR |
|
@shamika: please fix the ci errors. Thanks. |
REQUIRED: Add a summary of your PR here, typically including why the change is needed and what was changed. Include any design alternatives for discussion purposes.
This PR adds a samples demonstrating two deployment approaches for Mistral 7B with dynamically-loaded LoRA adapters on Vertex AI using vLLM, supporting both prebuilt and custom container strategies.
Key Differentiators:
Why This Change is Needed
LoRA (Low-Rank Adaptation) is a popular parameter-efficient fine-tuning technique, but there's currently no example in vertex-ai-samples showing how to
deploy LoRA-adapted models on Vertex AI using vLLM. Existing deployment patterns don't address:
This sample fills that gap by providing complete, reproducible examples for both vLLM deployment strategies.
What Was Changed
Added Notebooks (/prediction/vertexai_serving_vllm/):
Method 1: Prebuilt vLLM Container Approach
- Uses Vertex AI Model Garden's prebuilt vLLM container (pytorch-vllm-serve)
- Deploys Mistral 7B Instruct v0.3 with LoRA adapter via vLLM
- Simplest deployment path with minimal custom code
- LoRA adapters dynamically loaded from GCS paths using vLLM's --lora-modules parameter
- Demonstrates downloading Mistral 7B and adapters from HuggingFace and uploading to GCS
- Shows inference comparison between base Mistral model and LoRA adapter using vLLM's API
Method 2: Custom vLLM Container Approach
- Full control over vLLM container build and model loading logic
- Custom entrypoint with explicit GCS download handling for Mistral 7B and adapters before vLLM startup
- Cloud Build integration for automated container builds
- Supports both base models and LoRA adapters from arbitrary GCS locations
- Demonstrates advanced vLLM configuration options
Added Custom Container Implementation (custom_container/):
disk at startup, then launches vLLM with support for --lora-modules parameter
REQUIRED: Fill out the below checklists or remove if irrelevant
Official Notebooksunder the notebooks/official folder, follow this mandatory checklist:Official Notebookssection, pointing to the author or the author's team.Community Notebooksunder the notebooks/community folder:Community Notebookssection, pointing to the author or the author's team.Community Contentunder the community-content folder:Content Directory Nameis descriptive, informative, and includes some of the key products and attributes of your content, so that it is differentiable from other contentCommunity Contentsection, pointing to the author or the author's team.