MNNLLM iOS Application

Introduction

This project is an iOS application based on the MNN engine, supporting local large-model multimodal conversations.

It operates fully offline with high privacy. Once the models are downloaded to the device, all conversations occur locally without any network uploads or processing.

Features

Local Models
- Display locally downloaded models
- Support custom pinning
Model Market
- Get list of models supported by MNN
- Model management: download and delete models
  - Support switching between Hugging Face, ModelScope, and Modeler download sources
- Model search: support keyword search and tag search
Benchmark Testing
- Support automated benchmark testing, outputting Prefill speed, Decode Speed, and Memory Usage information
- Support batch testing for text, image, and audio inputs
Multimodal Chat: Supports full Markdown format output
- Text-to-text
- Audio-to-text (supports audio output for Omni models)
- Image-to-text: images can be captured or selected from gallery
- Video-to-text: supports video input processing
- Sana Diffusion: supports image style transfer (e.g., Ghibli style)
Model Configuration
- Support configuring mmap
- Support configuring sampling strategy
- Support configuring diffusion settings
- Support configuring backend type (CPU/Metal)
- Support configuring precision (low/normal/high)
- Support configuring thread count
- Support configuring multimodal prompt API
- Support configuring audio output for Omni models
Chat History
- Support model conversation history list, restore historical conversation scenarios

Video Introduction

Click here to download the original resolution introduction video

Application Preview:


Text To Text	Image To Text	Audio To Text	Model Filter

Local Model	Model Market	Benchmark	History

Additionally, the app supports edge-side usage of DeepSeek with Think mode:

How to Build and Use

Clone the repository:

git clone https://github.com/alibaba/MNN.git

Build the MNN.framework:

sh package_scripts/ios/buildiOS.sh "
-DMNN_ARM82=ON
-DMNN_LOW_MEMORY=ON
-DMNN_SUPPORT_TRANSFORMER_FUSE=ON
-DMNN_BUILD_LLM=ON
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON
-DMNN_METAL=ON
-DMNN_BUILD_DIFFUSION=ON
-DMNN_OPENCL=OFF
-DMNN_SEP_BUILD=OFF
-DLLM_SUPPORT_AUDIO=ON
-DMNN_BUILD_AUDIO=ON
-DLLM_SUPPORT_VISION=ON
-DMNN_BUILD_OPENCV=ON
-DMNN_IMGCODECS=ON
-DMNN_BUILD_LLM_OMNI=ON
"

Copy the framework to the iOS project:
```
mv MNN-iOS-CPU-GPU/Static/MNN.framework apps/iOS/MNNLLMChat
```
Ensure the Link Binary With Libraries section includes the MNN.framework:

If it's missing, add it manually:
Update iOS signing and build the project:
```
cd apps/iOS/MNNLLMChat
open MNNLLMiOS.xcodeproj
```
In Xcode, go to Signing & Capabilities > Team and input your Apple ID and Bundle Identifier:

Wait for the Swift Package to finish downloading before building.

Notes

Due to memory limitations on iPhones, it is recommended to use models with 7B parameters or fewer to avoid memory-related crashes.

Here is the professional technical translation of the provided text:

Local Debugging

For local debugging, simply drag the model files to the LocalModel folder and run the project:

First, download the MNN-related models from Hugging Face or Modelscope:
Drag the downloaded model folder into the project's LocalModel folder.
For root directory models, you can configure them:

Go to ModelListViewModel.swift for configuration, such as whether to support thinking mode:

// MARK: Config the Local Model here
let modelName = "Qwen3-0.6B-MNN-Inside" // Model name
let localModel = ModelInfo(
    modelName: modelName,
    tags: [
        // MARK: if you know that model support think, uncomment the line
        // NSLocalizedString("tag.deepThinking", comment: "Deep thinking tag for local model"), // Whether to support think
            NSLocalizedString("tag.localModel", comment: "Local model inside the app")],
    categories: ["Local Models"],
    vendor: "Local",
    sources: ["local": "bundle_root/\(modelName)"],
    isDownloaded: true
)
localModels.append(localModel)
ModelStorageManager.shared.markModelAsDownloaded(modelName)

Run the project, navigate to the chat page, and perform model interactions and debugging.

The app will automatically detect and load models from the LocalModel folder without requiring additional configuration.

Release Notes

Version 0.5

Added Sana Diffusion support for image style transfer (e.g., Ghibli style)
Added audio output support for Omni models
Added video input support for multimodal conversations
Added multimodal prompt API configuration
Added backend type configuration (CPU/Metal)
Added precision configuration (low/normal/high)
Added thread count configuration
Added batch testing support for text, image, and audio inputs

Version 0.4

Added three major project modules: Local Models, Model Market, and Benchmark Testing
Added benchmark testing to test different model performance
Added settings page, accessible from the history sidebar
Added Ali CDN for getting model lists
Added model market filtering functionality

Version 0.3.1

Add support for model parameter configuration

| | | |

Version 0.3

New Features:

Add support for downloading from the Modeler source
Add support for Stable Diffusion text-to-image generation

| | |

Version 0.2

New Features:

Add support for mmap configuration and manual cache clearing
Add support for downloading models from the ModelScope source

| | |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNNLLM iOS Application

Introduction

Features

Video Introduction

Application Preview:

How to Build and Use

Notes

Local Debugging

Release Notes

Version 0.5

Version 0.4

Version 0.3.1

Version 0.3

Version 0.2

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MNNLLM iOS Application

Introduction

Features

Video Introduction

Application Preview:

How to Build and Use

Notes

Local Debugging

Release Notes

Version 0.5

Version 0.4

Version 0.3.1

Version 0.3

Version 0.2

References