docs: GPU-Architektur Doku (STATE.md, RAGFLOW.md)
- STATE.md: Hybrid-Architektur, LLM-Routing, VRAM-Belegung - RAGFLOW.md: Warnung dass qwen2.5:14b Hauptmodell verdraengt
This commit is contained in:
parent
0b99490909
commit
07b785ece1
2 changed files with 18 additions and 9 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# Hausmeister Bot - STATE
|
||||
**Stand:** 21.03.2026
|
||||
**Status:** Produktiv, sauber, Local-First Architektur
|
||||
**Stand:** 25.03.2026
|
||||
**Status:** Produktiv — Hybrid-Architektur (GPU Text + Cloud Vision)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -54,13 +54,20 @@ liefert in 75s strukturierte Reports mit Quellen.
|
|||
|
||||
---
|
||||
|
||||
## KI-Server (RTX 3090, Muldenstein, 100.84.255.83)
|
||||
## KI-Server (RTX 3090, ki-server Windows, 100.84.255.83)
|
||||
|
||||
| Modell | Typ | Groesse | Zweck |
|
||||
|--------|-----|---------|-------|
|
||||
| qwen3:30b-a3b | Text, MoE | 18.5 GB | Standard + Tools |
|
||||
| qwen3-vl:32b | Vision+Text | 20.9 GB | Bilder, OCR, Dokumente |
|
||||
| qwen2.5:14b | Text | 9 GB | Timeout-Fallback |
|
||||
GPU-Architektur: Text + Embeddings permanent im VRAM, Vision ueber Cloud.
|
||||
Warmup bei Bot-Start via warmup_ollama() mit keep_alive=-1.
|
||||
|
||||
| Modell | Typ | VRAM | Status | Zweck |
|
||||
|--------|-----|------|--------|-------|
|
||||
| qwen3:30b-a3b | Text, MoE | 22.0 GB | PERMANENT | Standard + Tools, alle Dienste |
|
||||
| nomic-embed-text | Embedding | 0.6 GB | PERMANENT | RAGFlow, Vektorsuche |
|
||||
| qwen2.5:14b | Text | 17.8 GB | on-demand | Timeout-Fallback (verdraengt Hauptmodell!) |
|
||||
| Total | | 22.6 / 24 GB | | |
|
||||
|
||||
Frueher: qwen3-vl:32b (Vision) lief lokal, konkurrierte mit Text um GPU.
|
||||
Jetzt: Vision via openai/gpt-4o-mini (OpenRouter Cloud).
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -30,7 +30,9 @@
|
|||
|
||||
- **Docker Compose** in `/opt/ragflow/docker/`
|
||||
- **Elasticsearch** (Vector-DB), MySQL, MinIO, Redis
|
||||
- **Ollama** (KI-Server 100.84.255.83): nomic-embed-text (Embeddings), qwen2.5:14b (Chat)
|
||||
- **Ollama** (KI-Server 100.84.255.83): nomic-embed-text (Embeddings, PERMANENT im VRAM), qwen2.5:14b (Chat)
|
||||
- **WICHTIG**: qwen3:30b-a3b + nomic-embed-text sind permanent geladen (keep_alive=-1).
|
||||
RAGFlow-Chat mit qwen2.5:14b verdraengt das Hauptmodell! Empfehlung: Chat-Modell auf qwen3:30b-a3b umstellen.
|
||||
- **Synology SMB** gemountet: `/mnt/synology/Seafile/Nextcloud-Migration/` (~13k PDFs)
|
||||
|
||||
## Wichtige Befehle
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue