Skip to content

helianfeixing/lab-assistant-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Server Assistant AI Agent for Bio Lab

[ English | 简体中文 ]

An intelligent server management assistant based on DeepSeek model and OpenAI SDK, specifically designed for biology laboratories.

Python Version Platform OpenAI DeepSeek License

An intelligent server management assistant specifically designed for bioinformatics laboratories, developed based on the DeepSeek model. Capable of answering questions about server software paths, genome locations, server configurations based on the knowledge base, and providing analysis code as needed. Supports natural language interaction, suitable for beginners to quickly get started with bioinformatics analysis.

Core functionality

  • Intelligent Q&A System - Natural language queries based on knowledge base
  • Structured Knowledge Base Query - Intelligent retrieval of multiple information categories including software details, genomic data, server configurations, etc.
  • Cross-Platform Compatibility - Supports macOS and Ubuntu systems
  • AI-Enhanced Responses - Integrated DeepSeek model for professional answers
  • Modular Design - Easily extensible and maintainable code structure

Currently Supported Knowledge Base Types

  • Software Information - Paths, versions, descriptions
  • Genomic Data - Reference genome locations, versions, file paths
  • Dataset Information - Storage locations, sample information
  • Server Configuration - Server information, storage capacity
  • Can add knowledge base types as needed

Project Structure

├── api/ # API client modules
├── config/ # Configuration files
├── core/ # Core functionality modules
├── handlers/ # Knowledge base handlers
├── knowledge/ # Knowledge base data files, modify according to lab specific needs
├── utils/ # Utility functions
├── main.py # Main program entry
└── requirements.txt # Dependency list

Quick Start

Environment Requirements

  • Python 3.7+
  • Ubuntu or macOS
  • Deepseek API key

Apply for Deepseek API key

https://api-docs.deepseek.com

Installation

  1. Clone the Project
git clone https://github.com/helianfeixing/lab-assistant-ai.git
cd lab-assistant-ai
  1. Install Dependencies
pip install openai requests
  1. Set API Key
# For Ubuntu
echo 'export DEEPSEEK_API_KEY="your_Deepseek_API_key"' >> ~/.bashrc
source ~/.bashrc

# For macOS
echo 'export DEEPSEEK_API_KEY="your_Deepseek_API_key"' >> ~/.zshrc
source ~/.zshrc
  1. Run the Program
python main.py

Operation Interface

🤖 === 欢迎使用 ** Lab服务器助手 ===
正在初始化AI助手...
✅ 初始化完成!

🤖  您好,我是 ** Lab服务器助手 ===
我可以帮您处理以下类型的问题:

🔧 **软件管理**
  - '如何运行samtools?'

🧬 **基因组数据**
  - '人类参考基因组存放在哪里?'

📊 **分析工具**
  - 'star比对索引在哪里?'

📋 **服务器规则**
  - '数据应该存放在哪里?'
  - 'home目录的使用限制'
  - '下载文件的存储规范'

💬 **对话命令**:
  - reset/重置: 开始新的对话
  - status/状态: 查看对话状态
  - help/帮助: 显示此帮助
  - quit/退出: 退出程序

💡 提示:可以用自然语言提问,如有疑问请联系管理员

💭 您有什么问题?

Usage Example

💭 您有什么问题? 我希望比对人类RNA-seq数据,提供代码
🤔 正在思考...

当然可以!针对人类RNA-seq数据的比对,我推荐使用 **STAR** 或 **HISAT2**,这两个工具在RNA-seq比对中都很常用。以下是基于你的需求的代码示例和说明:

---

### 1. **使用 STAR 进行比对**
STAR 适合处理剪接位点,速度快且准确。索引路径为 `/data/data/index/STAR_index/hg38`。

**示例代码:**
STAR --genomeDir /data/data/index/STAR_index/hg38 \
     --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz \
     --readFilesCommand zcat \
     --outFileNamePrefix sample_ \
     --outSAMtype BAM SortedByCoordinate \
     --runThreadN 8

**参数说明:**
- `--genomeDir`:指定人类基因组索引路径。
- `--readFilesIn`:输入测序文件(如果是单端数据,只用一个文件)。
- `--readFilesCommand zcat`:如果输入是 `.gz` 压缩文件,用 `zcat` 解压。
- `--outSAMtype BAM SortedByCoordinate`:输出排序后的BAM文件。
- `--runThreadN`:线程数,根据服务器资源调整。

**注意事项:**
- 确保输入文件路径正确,如果是单端数据,移除 `sample_R2.fastq.gz`。
- 输出文件会以 `sample_` 为前缀,包括BAM文件和日志。

---

### 2. **使用 HISAT2 进行比对**
HISAT2 内存占用较小,适合资源有限的环境。索引路径为 `/data/data/index/hisat2_index/hg38`。

**示例代码:**
hisat2 -x /data/data/index/hisat2_index/hg38 \
       -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz \
       -S sample.sam \
       --threads 8

**参数说明:**
- `-x`:指定索引路径。
- `-1` 和 `-2`:双端测序文件。
- `-S`:输出SAM文件。
- `--threads`:线程数。

**后续处理(将SAM转为BAM并排序):**
samtools view -bS sample.sam | samtools sort -o sample_sorted.bam

---

### 选择建议:
- 如果数据量大或需要高精度剪接位点检测,用 **STAR**。
- 如果服务器内存有限或数据量较小,用 **HISAT2**。

### 需要确认的细节:
- 你的数据是单端还是双端测序?
- 文件是否是压缩格式(如 `.gz`)?
- 是否需要其他输出格式(如BAM索引)?

如果有其他需求或细节,随时告诉我,我可以帮你调整代码!

Contact

Yunlong Zhang 12131328@mail.sustech.edu.cn

About

A server assistant AI agent for Bio Lab.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages