[Gap Analysis] Implement Comprehensive Model Hub Ecosystem

### User Story

> As a developer, I want to easily download and use popular, pre-trained models directly within AiDotNet, so I can quickly build powerful applications with state-of-the-art performance without needing to train models from scratch.

---

### **Problem & Proposed Solution**

Modern AI development heavily relies on leveraging pre-trained models. Currently, AiDotNet lacks a built-in mechanism for this, forcing users to manually download and convert weights. Hugging Face Hub is the gold standard, but we can do more than just download files.

This plan outlines a **Comprehensive Model Hub Ecosystem** for AiDotNet. It includes not only a robust client for downloading and caching models (`FromPretrainedAsync`) but also defines a standard "model package" format and specifies the creation of **Python-based conversion tools**. These tools are a critical accelerator, enabling the community to port existing PyTorch models to the AiDotNet Hub.

---

### **Phase 1: Core Hub Client and Caching System**

**Goal:** Build the foundational client for resolving, downloading, and caching model packages from a central repository.

#### **AC 1.1: Define the "Model Package" Standard (3 points)**
**Requirement:** Specify a clear, structured format for a model on the hub.

-   [ ] A model package will be a directory identified by a unique `modelId` (e.g., `ooples/mamba-2.9b`).
-   [ ] The directory must contain:
    -   `model_weights.bin`: The raw model parameters in a standardized binary format.
    -   `model_config.json`: A JSON file defining the model's architecture, layers, dimensions, and all other parameters needed to construct the model class in AiDotNet.
    -   `preprocessor_config.json` (Optional): Configuration for any required data preprocessing (e.g., image normalization stats, tokenizer vocabulary).
    -   `README.md`: A markdown file with model details, usage examples, and license information.

#### **AC 1.2: Implement the `ModelHub` Caching Service (8 points)**
**Requirement:** Create a static class to manage the download and caching of model packages.

-   [ ] Create a new project: `src/AiDotNet.Hub`.
-   [ ] Create a file: `src/Hub/ModelHub.cs`.
-   [ ] The service will manage a local cache directory (e.g., `C:\Users\username\.aidotnet\hub`).
-   [ ] Implement a method: `public static async Task<string> ResolveAsync(string modelId)`.
    -   **Logic:** It takes a `modelId`, checks if the corresponding package is in the cache. If not, it downloads the package from a central repository (e.g., a designated public URL).
    -   It must use a hash check (e.g., SHA256) to verify file integrity after download.
    -   It returns the absolute local path to the cached model package directory.

---

### **Phase 2: The `FromPretrained` API**

**Goal:** Create a simple, powerful API for users to instantiate AiDotNet models directly from the hub.

#### **AC 2.1: Implement `FromPretrainedAsync` in a Factory (13 points)**
**Requirement:** Add a static factory method that orchestrates the download and instantiation process.

-   [ ] Locate or create an appropriate factory class (e.g., `src/Factories/NeuralNetworkFactory.cs`).
-   [ ] Add a static method: `public static async Task<IModel<T>> FromPretrainedAsync(string modelId)`.
-   [ ] **Method Logic:**
    1.  [ ] Call `ModelHub.ResolveAsync(modelId)` to download/cache the model package and get its local path.
    2.  [ ] Read and deserialize `model_config.json`.
    3.  [ ] Use a factory pattern based on an `architecture_type` field in the config to instantiate the correct AiDotNet model class (e.g., `new Mamba<T>(config)` or `new Transformer<T>(config)`).
    4.  [ ] Load the weights from `model_weights.bin` into the newly created model instance.
    5.  [ ] Return the fully-loaded, ready-to-use model.

---

### **Phase 3: Ecosystem Expansion - Conversion and Discovery**

**Goal:** Build tools that make the hub useful and empower the community to contribute.

#### **AC 3.1: Create PyTorch-to-AiDotNet Converter Script (13 points)**
**Requirement:** A Python script to convert Hugging Face PyTorch models to the AiDotNet model package format. This is the most critical "expanded feature".

-   [ ] Create a new directory: `tools/converters`.
-   [ ] Develop a Python script (`convert_hf_to_adn.py`) that uses `transformers` and `torch`.
-   [ ] **Script Logic:**
    -   [ ] Takes a Hugging Face `model_id` as input.
    -   [ ] Downloads the PyTorch model.
    -   [ ] Extracts the `state_dict` and saves the weights in the `model_weights.bin` format (e.g., simple flat binary file).
    -   [ ] Reads the model's `config.json`, transforms it, and saves it as our `model_config.json`.
    -   [ ] Creates a basic `README.md`.
-   [ ] Add clear documentation in the `tools` directory explaining how to use the script.

#### **AC 3.2: Implement Hub Search Functionality (5 points)**
**Requirement:** Allow users to discover available models.

-   [ ] The central model repository will host a master `index.json` file. This file will be an array of objects, each containing a `modelId` and a short `description`.
-   [ ] Implement a method: `public static async Task<IEnumerable<ModelInfo>> SearchAsync(string query)` in `ModelHub.cs`.
-   [ ] This method will download the `index.json`, cache it for a short period, and perform a simple case-insensitive search on the `modelId` and `description` fields.

---

### **Phase 4: Integration and Testing**

**Goal:** Verify the entire ecosystem, from conversion to instantiation.

#### **AC 4.1: End-to-End Integration Test (8 points)**
**Requirement:** A test that verifies the entire workflow: convert a model, host it, and load it with `FromPretrainedAsync`.

-   [ ] **Setup:**
    1.  [ ] Use the `convert_hf_to_adn.py` script to convert a very small, simple Hugging Face model (e.g., `google/bert_uncased_L-2_H-128_A-2`).
    2.  [ ] Place the resulting model package in a test directory that can be served by a local test server.
-   [ ] **Test Logic:**
    1.  [ ] In a C# test, call `NeuralNetworkFactory.FromPretrainedAsync("test-org/tiny-bert")`, pointing the hub client to the local test server.
    2.  [ ] Assert that the model package is downloaded to the cache correctly.
    3.  [ ] Assert that the returned model object is of the correct type and is not null.
    4.  [ ] Run a sample prediction and assert that the output is numerically correct (matches the output from the original PyTorch model).

---

### **Definition of Done**

-   [ ] All checklist items are complete.
-   [ ] A developer can discover and load a pre-trained model from the hub with a single line of code: `var model = await NeuralNetworkFactory.FromPretrainedAsync("org/model-name");`.
-   [ ] A robust caching system is in place to avoid re-downloading models.
-   [ ] A documented Python script exists to convert Hugging Face models to the AiDotNet Hub format.
---

## ⚠️ CRITICAL ARCHITECTURAL REQUIREMENTS

**Before implementing this user story, you MUST review:**
- **📋 Full Requirements:** [`.github/USER_STORY_ARCHITECTURAL_REQUIREMENTS.md`](../.github/USER_STORY_ARCHITECTURAL_REQUIREMENTS.md)
- **📐 Project Rules:** [`.github/PROJECT_RULES.md`](../.github/PROJECT_RULES.md)

### Mandatory Implementation Checklist

#### 1. INumericOperations<T> Usage (CRITICAL)
- [ ] Include `protected static readonly INumericOperations<T> NumOps = MathHelper.GetNumericOperations<T>();` in base class
- [ ] NEVER hardcode `double`, `float`, or specific numeric types - use generic `T`
- [ ] NEVER use `default(T)` - use `NumOps.Zero` instead
- [ ] Use `NumOps.Zero`, `NumOps.One`, `NumOps.FromDouble()` for values
- [ ] Use `NumOps.Add()`, `NumOps.Multiply()`, etc. for arithmetic
- [ ] Use `NumOps.LessThan()`, `NumOps.GreaterThan()`, etc. for comparisons

#### 2. Inheritance Pattern (REQUIRED)
- [ ] Create `I{FeatureName}.cs` in `src/Interfaces/` (root level, NOT subfolders)
- [ ] Create `{FeatureName}Base.cs` in `src/{FeatureArea}/` inheriting from interface
- [ ] Create concrete classes inheriting from Base class (NOT directly from interface)

#### 3. PredictionModelBuilder Integration (REQUIRED)
- [ ] Add private field: `private I{FeatureName}<T>? _{featureName};` to `PredictionModelBuilder.cs`
- [ ] Add Configure method taking ONLY interface (no parameters):
  ```csharp
  public IPredictionModelBuilder<T, TInput, TOutput> Configure{FeatureName}(I{FeatureName}<T> {featureName})
  {
      _{featureName} = {featureName};
      return this;
  }
  ```
- [ ] Use feature in `Build()` with default: `var {featureName} = _{featureName} ?? new Default{FeatureName}<T>();`
- [ ] Verify feature is ACTUALLY USED in execution flow

#### 4. Beginner-Friendly Defaults (REQUIRED)
- [ ] Constructor parameters with defaults from research/industry standards
- [ ] Document WHY each default was chosen (cite papers/standards)
- [ ] Validate parameters and throw `ArgumentException` for invalid values

#### 5. Property Initialization (CRITICAL)
- [ ] NEVER use `default!` operator
- [ ] String properties: `= string.Empty;`
- [ ] Collections: `= new List<T>();` or `= new Vector<T>(0);`
- [ ] Numeric properties: appropriate default or `NumOps.Zero`

#### 6. Class Organization (REQUIRED)
- [ ] One class/enum/interface per file
- [ ] ALL interfaces in `src/Interfaces/` (root level)
- [ ] Namespace mirrors folder structure (e.g., `src/Regularization/` → `namespace AiDotNet.Regularization`)

#### 7. Documentation (REQUIRED)
- [ ] XML documentation for all public members
- [ ] `<b>For Beginners:</b>` sections with analogies and examples
- [ ] Document all `<param>`, `<returns>`, `<exception>` tags
- [ ] Explain default value choices

#### 8. Testing (REQUIRED)
- [ ] Minimum 80% code coverage
- [ ] Test with multiple numeric types (double, float)
- [ ] Test default values are applied correctly
- [ ] Test edge cases and exceptions
- [ ] Integration tests for PredictionModelBuilder usage

---

**⚠️ Failure to follow these requirements will result in repeated implementation mistakes and PR rejections.**

**See full details:** [`.github/USER_STORY_ARCHITECTURAL_REQUIREMENTS.md`](../.github/USER_STORY_ARCHITECTURAL_REQUIREMENTS.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Gap Analysis] Implement Comprehensive Model Hub Ecosystem #310

User Story

Problem & Proposed Solution

Phase 1: Core Hub Client and Caching System

AC 1.1: Define the "Model Package" Standard (3 points)

AC 1.2: Implement the `ModelHub` Caching Service (8 points)

Phase 2: The `FromPretrained` API

AC 2.1: Implement `FromPretrainedAsync` in a Factory (13 points)

Phase 3: Ecosystem Expansion - Conversion and Discovery

AC 3.1: Create PyTorch-to-AiDotNet Converter Script (13 points)

AC 3.2: Implement Hub Search Functionality (5 points)

Phase 4: Integration and Testing

AC 4.1: End-to-End Integration Test (8 points)

Definition of Done

⚠️ CRITICAL ARCHITECTURAL REQUIREMENTS

Mandatory Implementation Checklist

1. INumericOperations Usage (CRITICAL)

2. Inheritance Pattern (REQUIRED)

3. PredictionModelBuilder Integration (REQUIRED)

4. Beginner-Friendly Defaults (REQUIRED)

5. Property Initialization (CRITICAL)

6. Class Organization (REQUIRED)

7. Documentation (REQUIRED)

8. Testing (REQUIRED)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Gap Analysis] Implement Comprehensive Model Hub Ecosystem #310

Description

User Story

Problem & Proposed Solution

Phase 1: Core Hub Client and Caching System

AC 1.1: Define the "Model Package" Standard (3 points)

AC 1.2: Implement the ModelHub Caching Service (8 points)

Phase 2: The FromPretrained API

AC 2.1: Implement FromPretrainedAsync in a Factory (13 points)

Phase 3: Ecosystem Expansion - Conversion and Discovery

AC 3.1: Create PyTorch-to-AiDotNet Converter Script (13 points)

AC 3.2: Implement Hub Search Functionality (5 points)

Phase 4: Integration and Testing

AC 4.1: End-to-End Integration Test (8 points)

Definition of Done

⚠️ CRITICAL ARCHITECTURAL REQUIREMENTS

Mandatory Implementation Checklist

1. INumericOperations Usage (CRITICAL)

2. Inheritance Pattern (REQUIRED)

3. PredictionModelBuilder Integration (REQUIRED)

4. Beginner-Friendly Defaults (REQUIRED)

5. Property Initialization (CRITICAL)

6. Class Organization (REQUIRED)

7. Documentation (REQUIRED)

8. Testing (REQUIRED)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

AC 1.2: Implement the `ModelHub` Caching Service (8 points)

Phase 2: The `FromPretrained` API

AC 2.1: Implement `FromPretrainedAsync` in a Factory (13 points)