Skip to content

MolecularBioinformatics/gemcat_gui

Repository files navigation

GEMCAT GUI

Python Version License Version Tests Code Style

GEMCAT GUI (Gene Expression-based Metabolic Context Analysis Tool) is a user-friendly graphical application for performing gene-expression to metabolite-concentration analysis. It maps gene expression data (RNA sequencing, proteomics) to genome-scale metabolic models and predicts changes in metabolite concentrations.

Features

  • 🧬 Gene Expression Analysis - Load and analyze RNA-seq or proteomics data
  • 🗺️ Metabolic Model Mapping - Support for Human (Recon3D), Mouse, and Rat models
  • 📊 Interactive Visualization - Plot and explore top metabolite changes
  • 💾 Export Results - Save analysis results to CSV or Excel format
  • 🎨 Modern Interface - Clean PyQt5-based GUI with intuitive workflow
  • 📈 Pathway Filtering - Filter metabolites by specific metabolic pathways

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Install from Source

# Clone the repository
git clone https://github.com/MolecularBioinformatics/gemcat_gui.git
cd gemcat_gui

# Install dependencies
pip install -r requirements.txt

# Install the package in development mode
pip install -e .

Run the Application

After installation, you can launch GEMCAT GUI in several ways:

# Method 1: Using the command-line entry point
gemcat-gui

# Method 2: Using Python module
python -m gemcat_gui.gui.main_window

# Method 3: Using the convenience script
python run_gemcat.py

Quick Start Guide

1. Launch the Application

Run the application using one of the methods above. The GUI has three main tabs:

  • Configuration - Load data, select models, and configure analysis
  • Results - View and export analysis results
  • Help - Access in-app documentation

2. Load Your Data

  1. Navigate to the Configuration tab
  2. Click Load RNA Data and select your gene expression file (CSV, TSV, or XLSX)
  3. Select the appropriate Data Separator (comma, tab, etc.)
  4. Choose the RNA Index Column containing gene identifiers

3. Configure GEMCAT Analysis

  1. Select a COBRA Model (Human_Recon3D, Mouse, Rat)
  2. Choose the Mapping ID Column that matches your gene identifiers
  3. Click Map Genes to link expression data to the metabolic model

4. Run Analysis

  1. Select Comparison Column (treatment/experimental group)
  2. Select Baseline Column (control group)
  3. Click Run GEMCAT to perform the analysis
  4. View results in the Results tab

User Guide

Configuration Tab

This tab is where you prepare and configure your data for GEMCAT analysis.

1. Gene Expression Data Loading

This section is for loading your RNA gene expression data.

  • Load RNA Data: Click this button to open a file dialog. Select your gene expression data file (e.g., CSV, TSV, XLSX).
    • Note: Your RNA data file should have gene identifiers (e.g., Ensembl IDs, Gene Symbols) in one column and expression values for different samples/conditions in other columns. Make sure the gene identifiers are consistent and accurate, as they are crucial for successful gene mapping.
  • Show Advanced Settings: Check this box to reveal advanced options, including the data separator.
  • Data Separator: Select the delimiter used in your data file (e.g., comma , for CSV, tab \t for TSV). This setting is essential for correctly parsing your input file.
  • RNA Index Column: After loading your RNA data, this dropdown will populate with all columns from your file. Select the column that contains your gene identifiers (e.g., 'Gene_ID', 'Ensembl_ID').
    • Hint: This column typically contains unique identifiers for each gene. Common examples include gene symbols (e.g., 'GAPDH'), Ensembl IDs (e.g., 'ENSG00000111640'), or Entrez IDs. Ensure the identifiers in this column are clean and consistent, as they will be used to match with the gene mapping file.
  • RNA Data Preview (First 10 Rows): This table will display the first 10 rows of your loaded RNA data, allowing you to verify that it has been loaded correctly.

2. GEMCAT Configuration

In this section, you will configure the COBRA model and gene mapping settings.

  • COBRA Model: Select the COBRA metabolic model you wish to use for the GEMCAT analysis. The available models are pre-configured within the application.
    • Note: Selecting a model will automatically load the corresponding gene mapping file required for that model. If a mapping file is not found, an error message will appear. The gene number column used for mapping (e.g., 'gene_number' for Human_Recon3D, 'SYMBOL' for Rat and Mouse) is automatically determined by the selected model and does not require manual selection.
  • Mapping ID Column: After selecting a COBRA model, this dropdown will populate with columns from the loaded gene mapping file. Select the column in the mapping file that contains gene identifiers matching those in your RNA data (e.g., 'Ensembl_ID', 'Locus_Tag'). This column will be used to link your RNA data to the metabolic model.
    • Hint: This column should contain gene identifiers that correspond exactly to the identifiers in your 'RNA Index Column' (e.g., if your RNA data uses Ensembl IDs, select the Ensembl ID column from the mapping file). Mismatches here will lead to failed gene mapping.
  • Map Genes: Click this button after you have loaded RNA data, selected an RNA Index Column, chosen a COBRA model, and specified the Mapping ID Column. This step performs the gene mapping, translating your gene expression data into a format compatible with the GEMCAT model.
    • Important: You must successfully map genes before you can run the GEMCAT analysis.

3. Run Analysis

Once all configurations are complete, you can initiate the GEMCAT analysis.

  • Comparison Column: Select the column from your RNA data that represents your experimental or 'treatment' group.
  • Baseline Column: Select the column from your RNA data that represents your control or 'baseline' group. GEMCAT will calculate changes relative to this column.
    • Warning: The Comparison and Baseline columns cannot be the same.
  • Run GEMCAT: Click this button to start the GEMCAT analysis. This process can take some time, depending on the size of your data and the complexity of the model.
    • Note: The GUI may appear to be temporarily unresponsive during this calculation, as the analysis is run in a separate thread. A 'Wait Cursor' will indicate that a process is running.
    • Note: Upon completion, the application will automatically switch to the 'Results' tab.

Results Tab

This tab displays the computed GEMCAT results and allows for visualization.

GEMCAT Analysis Results Table

  • This table will display the predicted metabolite changes resulting from the GEMCAT analysis. The primary column will show the predicted change (e.g., fold change) for each metabolite.
  • Search: Use the search box above the table to instantly filter results by any value or keyword.
  • Sort: Click any column header to sort the table. Numeric columns (including log2FoldChange) sort correctly.
  • Save Results: Click this button to save the entire results table to a CSV (.csv) or Excel (.xlsx) file.

Top N Metabolite Change Plot

  • Pathway: Use this dropdown to filter the metabolites displayed in the plot by a specific metabolic pathway. Select "All Pathways" to view metabolites from all pathways.
  • Top N: Use the spin box to specify how many of the top metabolites (based on absolute predicted change) you want to display in the plot.
  • Plot Top N Metabolites: Click this button to generate or update the bar plot.
    • Red bars indicate a predicted decrease in metabolite concentration.
    • Green bars indicate a predicted increase in metabolite concentration.
  • Plot Navigation Toolbar: Below the plot, you'll find a toolbar with standard matplotlib navigation controls (Pan, Zoom, Home, Save, etc.).
  • Save Plot: The plot can be saved as an image (e.g., PNG, JPG, PDF) directly from the plot navigation toolbar by clicking the disk icon.

Help Tab

This tab provides access to this help documentation directly within the application. You can also use the Open Log File button to view the application's log file (gemcat_gui.log) in your default text editor for troubleshooting.

Troubleshooting and Tips

  • "Error loading RNA data..." / "Error mapping genes..."
    • Check your file path and ensure the file exists.
    • Verify the selected 'Data Separator' matches your file.
    • Ensure the 'RNA Index Column' and 'Mapping ID Column' accurately reflect your gene identifiers.
    • Confirm your data is correctly formatted and not empty.
  • "Comparison and baseline columns cannot be the same."
    • Select different columns for your 'Comparison' and 'Baseline' conditions.
  • "No results to display or dataframe is empty."
    • Ensure GEMCAT analysis completed successfully. Check the log file (gemcat_gui.log) for errors.
  • Log File: A log file named gemcat_gui.log is generated in the application's directory. You can open it directly from the Help tab using the "Open Log File" button for detailed information about application events, warnings, and errors.
  • Model Files and Gene Mappings: Ensure that the necessary COBRA model files and their corresponding gene mapping files (as defined in config.py) are correctly placed and accessible by the application.

Project Structure

gemcat_gui/
├── src/
│   └── gemcat_gui/          # Main package
│       ├── config/          # Configuration and constants
│       ├── core/            # Core data processing logic
│       └── gui/             # GUI components and widgets
├── tests/                   # Test suite
├── examples/                # Example scripts and notebooks
│   ├── notebooks/           # Jupyter notebooks
│   └── data/                # Sample datasets
├── src/models/              # Metabolic model files (139 MB)
├── README.md                # This file
├── setup.py                 # Package setup
├── pyproject.toml          # Modern Python packaging config
└── requirements.txt        # Dependencies

Supported Models

Model Organism Reactions Metabolites Genes
Recon3D Human 13,543 4,140 2,248
Rat-GEM Rat 8,104 2,773 1,855
Mouse-GEM Mouse 8,104 2,773 1,835

Development

Running Tests

# Run all tests
pytest tests/ -v

# Run specific test suite
pytest tests/test_basic.py -v

# Run with coverage
pytest tests/ --cov=src/gemcat_gui --cov-report=html

Package Information

Dependencies

  • pandas - Data manipulation
  • numpy - Numerical computations
  • cobra - Metabolic modeling
  • gemcat - Core GEMCAT analysis
  • PyQt5 - GUI framework
  • matplotlib - Plotting and visualization
  • seaborn - Enhanced visualizations
  • openpyxl - Excel file support

Troubleshooting

Common Issues

Import Errors

# Ensure the package is installed
pip install -e .

Missing Dependencies

# Reinstall all dependencies
pip install -r requirements.txt

PyQt5 Issues

# Install PyQt5 explicitly
pip install PyQt5

Model Files Not Found

  • Ensure src/models/ directory contains the metabolic model files
  • Check that model paths in configuration are correct

Log Files

The application generates a log file gemcat_gui.log in the working directory. Check this file for detailed error messages and debugging information.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Citation

If you use GEMCAT GUI in your research, please cite:

[Citation information to be added]

Contact and Support

Maintainer: Suraj Sharma
Email: suraj.sharma@uib.no
Institution: University of Bergen

For bug reports and feature requests, please use the GitHub Issues page.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

GUI for GEMCAT Algorithm

GEMCAT: https://pypi.org/project/gemcat/

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages