refactor(llm): Local-First Routing mit Sonar-Websuche

- Basis: 981118f9 (lokales Qwen3 30B) wiederhergestellt - Drei Pfade: lokal (qwen3:30b-a3b), Vision (qwen3-vl:32b), Sonar (perplexity/sonar) - _route_model() fuer sauberes Routing (Web-Keywords -> Sonar, Rest -> lokal) - /no_think fuer Ollama, Timeout-Fallback auf qwen2.5:14b - Passthrough-Tools fuer Grafana-Daten - deep_research TOOLS wieder aktiviert - Preis-Spaghetti-Logik entfernt
2026-03-21 12:06:00 +01:00 · 2026-03-21 12:06:00 +01:00 · 36d708bee1
commit 36d708bee1
parent bfb4c385c2
4 changed files with 134 additions and 146 deletions
--- a/homelab-ai-bot/STATE.md
+++ b/homelab-ai-bot/STATE.md
@ -1,78 +1,34 @@
 # Hausmeister Bot - STATE
 **Stand:** 21.03.2026
-**Status:** laeuft, aber in inkonsistentem Umbauzustand
+**Status:** Saubere Local-First Architektur mit Sonar-Websuche
-## Kurzfassung
+## Architektur (3 Pfade)
 Der Bot ist aktuell nicht in einem sauberen Zielzustand.
 Er wurde von `local-only` auf ein teilweises Hybrid-Routing umgebaut, ohne die Gesamtarchitektur sauber abzuschliessen.
 Dadurch funktioniert ein Teil der Anfragen besser, aber der Systemzustand ist inkonsistent und nicht final.
-## Aktueller Live-Zustand
+| Pfad | Modell | Endpoint | Zweck |
- `hausmeister-bot.service` ist aktiv.
+|------|--------|----------|-------|
- `llm.py` hat uncommittete Live-Aenderungen.
+| Text + Tools | qwen3:30b-a3b | Ollama lokal (RTX 3090) | Alle Homelab-Tools |
- Standardmodell in `llm.py`: `qwen3-vl:32b`
+| Vision | qwen3-vl:32b | Ollama lokal (RTX 3090) | Bilderkennung, OCR |
- Online-Textmodell in `llm.py`: `openai/gpt-4o-mini`
+| Websuche | perplexity/sonar | OpenRouter | Preise, News, Recherche |
- Auf dem Ollama-Server ist aktuell kein Modell vorgeladen (`/api/ps` leer).
+| Deep Research | CT 121 LangGraph | Direkt-API | Tiefenrecherche (explizit) |
 | Fallback | qwen2.5:14b | Ollama lokal | Bei Timeout |
-## Was aktuell geroutet wird
+## Routing (_route_model)
-### Lokal
+- Web-Keywords (preis, recherche, news, etc.) -> Sonar via OpenRouter
- normale Textaufgaben ohne Preis-/Recherche-Trigger
+- Deep Research / Tiefenrecherche -> CT 121 direkt
- normale Bildaufgaben
+- Alles andere -> qwen3:30b-a3b lokal
 - Tool-Nutzung allgemein ueber `tool_loader`
-### Online (`openai/gpt-4o-mini`)
+## Features
- Preisfragen
+- /no_think fuer Ollama-Modelle (schnellere Antworten)
- Web-/Recherchefragen anhand einfacher Keyword-Heuristik in `llm.py`
+- Timeout-Fallback auf qwen2.5:14b
- Bildanfragen mit Preisbezug
+- Passthrough-Tools (Grafana-Daten direkt durchreichen)
 - Memory-System + Session-History
 - 19 Tool-Module (auto-discovery via tool_loader)
-## Was daran kaputt / unsauber ist
+## Was funktioniert
-1. Das System ist nicht mehr rein `local-first`.
+- Lokale KI steuert alle Homelab-Dienste (RSS, Proxmox, Loki, etc.)
-   Standardziel war: Standardaufgaben lokal, online nur als klarer Sonderfall.
+- Websuche laeuft ueber Perplexity Sonar (kein Tool-Calling, ein API-Call)
-   Aktuell entscheidet eine einfache Triggerliste in `llm.py` ueber Online-Routing.
+- Vision lokal via qwen3-vl:32b
 - Deep Research via CT 121
-2. `deep_research` ist faktisch deaktiviert.
+## Git-Stand
-   In `tools/deep_research.py` steht `TOOLS = []`.
+Committed und nach Forgejo gepusht. Auto-Sync laeuft.
   Der Handler existiert noch, aber das LLM sieht das Tool nicht und kann es nicht normal aufrufen.
 3. Es gibt gewachsene Sonderlogik in `llm.py`.
   Darin stecken u.a. Preis-/Einheitenregeln, Routing-Heuristiken und Bild-Sonderfaelle.
   Das ist funktional entstanden, aber architektonisch nicht sauber getrennt.
 4. Der aktuelle Zustand ist nicht sauber versioniert.
   `homelab-ai-bot/llm.py` ist lokal geaendert, aber nicht committed.
   Der laufende Zustand und der Git-Stand sind also aktuell nicht identisch.
 5. Das Vision-Standardmodell ist derzeit `qwen3-vl:32b`.
   Dieses Modell war auf der 3090 fuer Bot-Nutzung spuerbar zu langsam.
   Das Routing kompensiert das aktuell durch Online-Ausnahmen, loest aber nicht die Grundarchitektur.
 ## Was weiterhin funktioniert
 - Tool-Loader und Handler-System funktionieren grundsaetzlich.
 - Die meisten Tools sind modellunabhaengige Python-Handler und bleiben nutzbar:
  - `web_search`
  - `memory_*`
  - `get_feed_stats`
  - Proxmox / Loki / Grafana / Prometheus / Mail / SaveTV / Seafile / Tailscale / PBS / WordPress
 - Goldpreis-Test ueber `web_search` + `gpt-4o-mini` lieferte plausibles Ergebnis statt Gramm/Unze-Verwechslung.
 ## Was aktuell nicht als stabil gelten darf
 - `deep_research`
 - sauberes `local-first` Routing
 - Preis-/Recherche-Routing als finale Architektur
 - Bot-Verhalten bei weiteren Sonderfaellen ausserhalb des bisher getesteten Bereichs
 ## Eigentliches Zielbild
 - Standardaufgaben lokal
 - Bild-/OCR-/Scraper-Aufgaben lokal
 - Online nur fuer klar definierte Ausnahmen:
  - Preisfragen
  - Web-Recherche
  - Deep Research
 - Routing zentral und explizit im Code, nicht ueber gewachsene Prompt-Sonderregeln
 ## Naechster sinnvoller Schritt
 Kein weiterer Quick-Fix.
 Stattdessen sauberer Umbau von `llm.py` in eine klare Routing-Architektur mit drei expliziten Pfaden:
 1. lokaler Standardpfad
 2. lokaler Vision-Pfad
 3. Online-Recherchepfad
--- a/homelab-ai-bot/pycache/llm.cpython-311.pyc
+++ b/homelab-ai-bot/pycache/llm.cpython-311.pyc
--- a/homelab-ai-bot/llm.py
+++ b/homelab-ai-bot/llm.py
@ -18,14 +18,26 @@ log = logging.getLogger('llm')
 OLLAMA_BASE = "http://100.84.255.83:11434"
 OPENROUTER_BASE = "https://openrouter.ai/api/v1"
-MODEL = "openai/gpt-4o-mini"
+MODEL_LOCAL = "qwen3:30b-a3b"
-VISION_MODEL = "qwen3-vl:32b"
+MODEL_VISION = "qwen3-vl:32b"
-FALLBACK_MODEL = "qwen3:30b-a3b"
+MODEL_ONLINE = "perplexity/sonar"
 FALLBACK_MODEL = "qwen2.5:14b"
 MAX_TOOL_ROUNDS = 3
-OLLAMA_MODELS = {VISION_MODEL, FALLBACK_MODEL}
+OLLAMA_MODELS = {MODEL_LOCAL, MODEL_VISION, FALLBACK_MODEL}
 PASSTHROUGH_TOOLS = {"get_temperaturen", "get_energie", "get_heizung"}
 _WEB_TRIGGERS = [
    "recherche", "recherchiere", "suche im internet", "web search",
    "preis", "preise", "kostet", "kosten", "price",
    "news", "nachrichten", "aktuell", "aktuelle",
    "google", "finde heraus", "finde raus",
    "gold", "silber", "kurs", "kurse",
    "vergleich", "vergleiche",
    "was kostet", "wie teuer", "wie viel",
 ]
 _DEEP_TRIGGERS = ["deep research", "tiefenrecherche"]
 import datetime as _dt
 _TODAY = _dt.date.today()
 _3M_AGO = (_TODAY - _dt.timedelta(days=90))
@ -80,10 +92,6 @@ SESSION-RUECKBLICK:
 - Optional kurz erwaehnen was sonst noch Thema war.
 - session_search nur fuer Stichwort-Suche in ALTEN Sessions (nicht aktuelle).
 TOOL-ERGEBNISSE:
 - Tool-Ausgaben sind bereits fertig formatiert (Umlaute, Einheiten, Struktur).
 - Gib sie 1:1 wieder. NICHT umformulieren, kuerzen oder Umlaute ersetzen.
 BILDERKENNUNG — ALLGEMEIN:
 Wenn der User ein Bild schickt das KEIN kritisches Dokument ist (z.B. Foto, Screenshot, Landschaft):
 - Beschreibe strukturiert was du siehst.
@ -170,7 +178,6 @@ PREISRECHERCHE (PFLICHT):
 Wenn der User nach Preisen, Kosten oder Preisentwicklung fragt:
 - Nutze IMMER Tools statt Allgemeinwissen.
 - Fuer schnelle Preisabfrage: web_search.
 - Auch wenn ein Bild mitgeschickt wird: Preise IMMER per web_search verifizieren — Bilder koennen veraltet sein.
 - Mache 2-3 gezielte web_search Aufrufe mit verschiedenen Suchbegriffen.
 - deep_research NUR wenn User explizit 'deep research' oder 'tiefenrecherche' sagt.
 - Gib konkrete Zahlen aus (EUR), nicht nur Tendenzen.
@ -194,8 +201,18 @@ def _get_api_key() -> str:
    return cfg.api_keys.get("openrouter_key", "")
 def _route_model(question: str) -> str:
    """Entscheidet ob lokal, online (Sonar) oder deep_research."""
    q = question.lower()
    if any(t in q for t in _DEEP_TRIGGERS):
        return "deep_research"
    if any(t in q for t in _WEB_TRIGGERS):
        return MODEL_ONLINE
    return MODEL_LOCAL
 def _ollama_timeout_for(model: str) -> int:
-    if model == VISION_MODEL:
+    if model == MODEL_VISION:
        return 240
    if model == FALLBACK_MODEL:
        return 90
@ -203,6 +220,7 @@ def _ollama_timeout_for(model: str) -> int:
 def _add_no_think(messages: list) -> None:
    """Haengt /no_think an die letzte User-Nachricht fuer Ollama."""
    for msg in reversed(messages):
        if msg.get("role") != "user":
            continue
@ -210,17 +228,17 @@ def _add_no_think(messages: list) -> None:
        if isinstance(content, str) and "/no_think" not in content:
            msg["content"] = content + " /no_think"
        elif isinstance(content, list):
-            for item in content:
+            for part in content:
-                if item.get("type") == "text" and "/no_think" not in item.get("text", ""):
+                if part.get("type") == "text" and "/no_think" not in part.get("text", ""):
-                    item["text"] = item["text"] + " /no_think"
+                    part["text"] = part["text"] + " /no_think"
                    break
        break
-def _call_openrouter(messages: list, api_key: str, use_tools: bool = True,
+def _call_api(messages: list, api_key: str, use_tools: bool = True,
-                     model: str = None, max_tokens: int = 4000,
+              model: str = None, max_tokens: int = 4000,
-                     allow_fallback: bool = True) -> dict:
+              allow_fallback: bool = True) -> dict:
-    chosen = model or MODEL
+    chosen = model or MODEL_LOCAL
    use_ollama = chosen in OLLAMA_MODELS
    log.info("LLM-Call: model=%s ollama=%s max_tokens=%d", chosen, use_ollama, max_tokens)
@ -248,19 +266,14 @@ def _call_openrouter(messages: list, api_key: str, use_tools: bool = True,
        r.raise_for_status()
        return r.json()
    except requests.exceptions.ReadTimeout:
-        if use_ollama and allow_fallback and chosen == MODEL and FALLBACK_MODEL and FALLBACK_MODEL != chosen:
+        if use_ollama and allow_fallback and FALLBACK_MODEL and chosen != FALLBACK_MODEL:
            log.warning(
-                "Ollama timeout for %s after %ss, retrying with fallback model %s",
+                "Ollama timeout for %s after %ss, retrying with %s",
-                chosen,
+                chosen, timeout, FALLBACK_MODEL,
                timeout,
                FALLBACK_MODEL,
            )
-            return _call_openrouter(
+            return _call_api(
-                messages,
+                messages, api_key, use_tools=use_tools,
-                api_key,
+                model=FALLBACK_MODEL, max_tokens=max_tokens,
                use_tools=use_tools,
                model=FALLBACK_MODEL,
                max_tokens=max_tokens,
                allow_fallback=False,
            )
        raise
@ -279,22 +292,38 @@ def ask(question: str, context: str) -> str:
        {"role": "user", "content": f"Kontext (Live-Daten):\n{context}\n\nFrage: {question}"},
    ]
    try:
-        data = _call_openrouter(messages, api_key, use_tools=False)
+        data = _call_api(messages, api_key, use_tools=False)
        return data["choices"][0]["message"]["content"]
    except Exception as e:
        return f"LLM-Fehler: {e}"
 def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -> str:
-    """Freitext-Frage mit automatischem Tool-Calling.
+    """Freitext-Frage mit automatischem Routing und Tool-Calling.
-    tool_handlers: dict von tool_name -> callable(**kwargs) -> str
+    Routing:
-    session_id: aktive Session fuer Konversations-History
+    - deep_research / tiefenrecherche -> Deep Research Handler direkt
    - Web/Preis/Recherche -> Perplexity Sonar (kein Tool-Calling)
    - Alles andere -> Lokales Modell mit allen Tools
    """
    api_key = _get_api_key()
    if not api_key:
        return "OpenRouter API Key fehlt in homelab.conf"
    route = _route_model(question)
    # --- Deep Research: direkt aufrufen, kein LLM noetig ---
    if route == "deep_research":
        log.info("Route: deep_research")
        try:
            from tools import deep_research
            return deep_research.handle_deep_research(query=question)
        except Exception as e:
            return f"Deep Research Fehler: {e}"
    log.info("Route: %s", route)
    # --- Memory + Prompt aufbauen ---
    try:
        import memory_client
        memory_items = memory_client.get_relevant_memory(question, top_k=10)
@ -340,17 +369,35 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -
    messages.append({"role": "user", "content": question})
    # --- Online (Sonar): kein Tool-Calling, Sonar sucht selbst ---
    if route == MODEL_ONLINE:
        try:
            data = _call_api(messages, api_key, use_tools=False, model=MODEL_ONLINE)
            content = data["choices"][0]["message"].get("content", "")
            if session_id:
                try:
                    memory_client.log_message(session_id, "user", question)
                    memory_client.log_message(session_id, "assistant", content)
                except Exception:
                    pass
            return content or "Keine Antwort von Sonar."
        except Exception as e:
            return f"Online-Suche Fehler: {e}"
    # --- Lokal: Tool-Calling mit allen Tools ---
    passthrough_result = None
    try:
        for _round in range(MAX_TOOL_ROUNDS):
-            data = _call_openrouter(messages, api_key, use_tools=True)
+            data = _call_api(messages, api_key, use_tools=True, model=MODEL_LOCAL)
            choice = data["choices"][0]
            msg = choice["message"]
            tool_calls = msg.get("tool_calls")
            if not tool_calls:
                content = msg.get("content") or ""
                if not content and msg.get("reasoning"):
                    content = msg.get("reasoning", "")
                if passthrough_result:
                    return passthrough_result
                return content or "Keine Antwort vom LLM."
@ -377,7 +424,7 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -
                result_str = str(result)[:3000]
                if fn_name in PASSTHROUGH_TOOLS and not result_str.startswith(("Fehler", "Keine")):
-                    log.info("Passthrough-Tool %s: Ergebnis wird direkt weitergegeben", fn_name)
+                    log.info("Passthrough-Tool %s", fn_name)
                    passthrough_result = result_str
                messages.append({
@ -388,7 +435,7 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -
        if passthrough_result:
            return passthrough_result
-        data = _call_openrouter(messages, api_key, use_tools=False)
+        data = _call_api(messages, api_key, use_tools=False, model=MODEL_LOCAL)
        return data["choices"][0]["message"]["content"]
    except Exception as e:
@ -396,7 +443,7 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -
 def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session_id: str = None) -> str:
-    """Bild-Analyse mit optionalem Text und Tool-Calling via Vision-faehigem Modell."""
+    """Bild-Analyse via lokalem Vision-Modell mit Tool-Calling."""
    api_key = _get_api_key()
    if not api_key:
        return "OpenRouter API Key fehlt in homelab.conf"
@ -418,41 +465,6 @@ def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session
        "Wenn es ein normales Bild ist: Beschreibe strukturiert was du siehst."
    )
    prompt_text = caption if caption else default_prompt
    _price_kw = ["preis", "kostet", "kosten", "price", "teuer", "guenstig", "billig",
                 "bestpreis", "angebot", "euro", "eur", "kaufen", "gold", "silber",
                 "unze", "ounce", "kurs", "wert", "ram", "ddr"]
    _check_text = (caption or "").lower()
    if not _check_text and session_id:
        try:
            import memory_client as _mc
            _recent = _mc.get_session_messages(session_id, limit=3)
            for _m in reversed(_recent):
                if _m.get("role") == "user" and _m.get("content"):
                    _check_text = _m["content"].lower()
                    break
        except Exception:
            pass
    _is_price_q = any(kw in _check_text for kw in _price_kw)
    if _is_price_q:
        prompt_text = (
            "WICHTIG: Es geht um aktuelle Preise/Kurse. "
            "Du MUSST ZUERST web_search aufrufen (kurze Keywords, z.B. goldpreis euro unze heute). "
            "Fordere MINDESTENS 5 Ergebnisse an (max_results=5). "
            "Das Bild ist NUR Kontext — Preise daraus NIEMALS als Antwort verwenden. "
            "EINHEITEN-FALLE: goldpreis.de zeigt Preise PRO GRAMM, nicht pro Unze! "
            "1 troy ounce = 31,103 Gramm. Wenn eine Quelle ~125 EUR zeigt und eine andere ~3.900 EUR, "
            "dann ist 125 EUR der GRAMM-Preis und 3.900 EUR der UNZEN-Preis. "
            "Nutze Quellen die explizit pro Unze oder per ounce schreiben (z.B. finanzen.net, boerse.de). "
            "Erst NACH der web_search darfst du antworten.\n\n"
            + prompt_text
        )
    else:
        prompt_text += (
            "\n\nHinweis: Wenn im Bild Preise oder Kurse sichtbar sind und der User "
            "danach fragt, nutze web_search fuer aktuelle Werte statt die Bild-Daten."
        )
    user_content = [
        {"type": "text", "text": prompt_text},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}", "detail": "high"}},
@ -481,14 +493,16 @@ def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session
    try:
        for _round in range(MAX_TOOL_ROUNDS):
-            data = _call_openrouter(messages, api_key, use_tools=True,
+            data = _call_api(messages, api_key, use_tools=True,
-                                    model=VISION_MODEL, max_tokens=4000)
+                             model=MODEL_VISION, max_tokens=4000)
            choice = data["choices"][0]
            msg = choice["message"]
            tool_calls = msg.get("tool_calls")
            if not tool_calls:
                content = msg.get("content") or ""
                if not content and msg.get("reasoning"):
                    content = msg.get("reasoning", "")
                return content or "Keine Antwort vom LLM."
            messages.append(msg)
@ -515,8 +529,8 @@ def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session
                    "content": str(result)[:3000],
                })
-        data = _call_openrouter(messages, api_key, use_tools=False,
+        data = _call_api(messages, api_key, use_tools=False,
-                               model=VISION_MODEL, max_tokens=4000)
+                         model=MODEL_VISION, max_tokens=4000)
        return data["choices"][0]["message"]["content"]
    except Exception as e:
--- a/homelab-ai-bot/tools/deep_research.py
+++ b/homelab-ai-bot/tools/deep_research.py
@ -26,7 +26,25 @@ QUALITAET BEI PREISFRAGEN:
 - Zeige Zeitraum, Preis damals/heute, Delta in % und Quellen.
 - Wenn keine belastbaren Daten vorhanden sind, sage es explizit."""
-TOOLS = []  # removed from auto-discovery; use HANDLERS directly
+TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "deep_research",
            "description": "KI-gestuetzte Tiefenrecherche (20-30 Quellen, 2-5 Min). NUR wenn User explizit deep research oder tiefenrecherche sagt.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Die Recherche-Frage"
                    }
                },
                "required": ["query"]
            },
        },
    },
 ]
 def _create_thread():