docs: GPU-Architektur Doku (STATE.md, RAGFLOW.md)
- STATE.md: Hybrid-Architektur, LLM-Routing, VRAM-Belegung - RAGFLOW.md: Warnung dass qwen2.5:14b Hauptmodell verdraengt
This commit is contained in:
parent
0b99490909
commit
07b785ece1
2 changed files with 18 additions and 9 deletions
|
|
@ -1,6 +1,6 @@
|
||||||
# Hausmeister Bot - STATE
|
# Hausmeister Bot - STATE
|
||||||
**Stand:** 21.03.2026
|
**Stand:** 25.03.2026
|
||||||
**Status:** Produktiv, sauber, Local-First Architektur
|
**Status:** Produktiv — Hybrid-Architektur (GPU Text + Cloud Vision)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -54,13 +54,20 @@ liefert in 75s strukturierte Reports mit Quellen.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## KI-Server (RTX 3090, Muldenstein, 100.84.255.83)
|
## KI-Server (RTX 3090, ki-server Windows, 100.84.255.83)
|
||||||
|
|
||||||
| Modell | Typ | Groesse | Zweck |
|
GPU-Architektur: Text + Embeddings permanent im VRAM, Vision ueber Cloud.
|
||||||
|--------|-----|---------|-------|
|
Warmup bei Bot-Start via warmup_ollama() mit keep_alive=-1.
|
||||||
| qwen3:30b-a3b | Text, MoE | 18.5 GB | Standard + Tools |
|
|
||||||
| qwen3-vl:32b | Vision+Text | 20.9 GB | Bilder, OCR, Dokumente |
|
| Modell | Typ | VRAM | Status | Zweck |
|
||||||
| qwen2.5:14b | Text | 9 GB | Timeout-Fallback |
|
|--------|-----|------|--------|-------|
|
||||||
|
| qwen3:30b-a3b | Text, MoE | 22.0 GB | PERMANENT | Standard + Tools, alle Dienste |
|
||||||
|
| nomic-embed-text | Embedding | 0.6 GB | PERMANENT | RAGFlow, Vektorsuche |
|
||||||
|
| qwen2.5:14b | Text | 17.8 GB | on-demand | Timeout-Fallback (verdraengt Hauptmodell!) |
|
||||||
|
| Total | | 22.6 / 24 GB | | |
|
||||||
|
|
||||||
|
Frueher: qwen3-vl:32b (Vision) lief lokal, konkurrierte mit Text um GPU.
|
||||||
|
Jetzt: Vision via openai/gpt-4o-mini (OpenRouter Cloud).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -30,7 +30,9 @@
|
||||||
|
|
||||||
- **Docker Compose** in `/opt/ragflow/docker/`
|
- **Docker Compose** in `/opt/ragflow/docker/`
|
||||||
- **Elasticsearch** (Vector-DB), MySQL, MinIO, Redis
|
- **Elasticsearch** (Vector-DB), MySQL, MinIO, Redis
|
||||||
- **Ollama** (KI-Server 100.84.255.83): nomic-embed-text (Embeddings), qwen2.5:14b (Chat)
|
- **Ollama** (KI-Server 100.84.255.83): nomic-embed-text (Embeddings, PERMANENT im VRAM), qwen2.5:14b (Chat)
|
||||||
|
- **WICHTIG**: qwen3:30b-a3b + nomic-embed-text sind permanent geladen (keep_alive=-1).
|
||||||
|
RAGFlow-Chat mit qwen2.5:14b verdraengt das Hauptmodell! Empfehlung: Chat-Modell auf qwen3:30b-a3b umstellen.
|
||||||
- **Synology SMB** gemountet: `/mnt/synology/Seafile/Nextcloud-Migration/` (~13k PDFs)
|
- **Synology SMB** gemountet: `/mnt/synology/Seafile/Nextcloud-Migration/` (~13k PDFs)
|
||||||
|
|
||||||
## Wichtige Befehle
|
## Wichtige Befehle
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue