refactor(llm): Local-First Routing mit Sonar-Websuche

- Basis: 981118f9 (lokales Qwen3 30B) wiederhergestellt - Drei Pfade: lokal (qwen3:30b-a3b), Vision (qwen3-vl:32b), Sonar (perplexity/sonar) - _route_model() fuer sauberes Routing (Web-Keywords -> Sonar, Rest -> lokal) - /no_think fuer Ollama, Timeout-Fallback auf qwen2.5:14b - Passthrough-Tools fuer Grafana-Daten - deep_research TOOLS wieder aktiviert - Preis-Spaghetti-Logik entfernt
2026-03-21 12:06:00 +01:00 · 2026-03-21 12:06:00 +01:00 · 36d708bee1
commit 36d708bee1
parent bfb4c385c2
4 changed files with 134 additions and 146 deletions
--- a/homelab-ai-bot/STATE.md
+++ b/homelab-ai-bot/STATE.md
@ -1,78 +1,34 @@
 # Hausmeister Bot - STATE
 **Stand:** 21.03.2026
-**Status:** laeuft, aber in inkonsistentem Umbauzustand
+**Status:** Saubere Local-First Architektur mit Sonar-Websuche

-## Kurzfassung
-Der Bot ist aktuell nicht in einem sauberen Zielzustand.
-Er wurde von `local-only` auf ein teilweises Hybrid-Routing umgebaut, ohne die Gesamtarchitektur sauber abzuschliessen.
-Dadurch funktioniert ein Teil der Anfragen besser, aber der Systemzustand ist inkonsistent und nicht final.
+## Architektur (3 Pfade)

-## Aktueller Live-Zustand
- `hausmeister-bot.service` ist aktiv.
- `llm.py` hat uncommittete Live-Aenderungen.
- Standardmodell in `llm.py`: `qwen3-vl:32b`
- Online-Textmodell in `llm.py`: `openai/gpt-4o-mini`
- Auf dem Ollama-Server ist aktuell kein Modell vorgeladen (`/api/ps` leer).
+| Pfad | Modell | Endpoint | Zweck |
+|------|--------|----------|-------|
+| Text + Tools | qwen3:30b-a3b | Ollama lokal (RTX 3090) | Alle Homelab-Tools |
+| Vision | qwen3-vl:32b | Ollama lokal (RTX 3090) | Bilderkennung, OCR |
+| Websuche | perplexity/sonar | OpenRouter | Preise, News, Recherche |
+| Deep Research | CT 121 LangGraph | Direkt-API | Tiefenrecherche (explizit) |
+| Fallback | qwen2.5:14b | Ollama lokal | Bei Timeout |

-## Was aktuell geroutet wird
-### Lokal
- normale Textaufgaben ohne Preis-/Recherche-Trigger
- normale Bildaufgaben
- Tool-Nutzung allgemein ueber `tool_loader`
+## Routing (_route_model)
+- Web-Keywords (preis, recherche, news, etc.) -> Sonar via OpenRouter
+- Deep Research / Tiefenrecherche -> CT 121 direkt
+- Alles andere -> qwen3:30b-a3b lokal

-### Online (`openai/gpt-4o-mini`)
- Preisfragen
- Web-/Recherchefragen anhand einfacher Keyword-Heuristik in `llm.py`
- Bildanfragen mit Preisbezug
+## Features
+- /no_think fuer Ollama-Modelle (schnellere Antworten)
+- Timeout-Fallback auf qwen2.5:14b
+- Passthrough-Tools (Grafana-Daten direkt durchreichen)
+- Memory-System + Session-History
+- 19 Tool-Module (auto-discovery via tool_loader)

-## Was daran kaputt / unsauber ist
-1. Das System ist nicht mehr rein `local-first`.
-   Standardziel war: Standardaufgaben lokal, online nur als klarer Sonderfall.
-   Aktuell entscheidet eine einfache Triggerliste in `llm.py` ueber Online-Routing.
+## Was funktioniert
+- Lokale KI steuert alle Homelab-Dienste (RSS, Proxmox, Loki, etc.)
+- Websuche laeuft ueber Perplexity Sonar (kein Tool-Calling, ein API-Call)
+- Vision lokal via qwen3-vl:32b
+- Deep Research via CT 121

-2. `deep_research` ist faktisch deaktiviert.
-   In `tools/deep_research.py` steht `TOOLS = []`.
-   Der Handler existiert noch, aber das LLM sieht das Tool nicht und kann es nicht normal aufrufen.
-
-3. Es gibt gewachsene Sonderlogik in `llm.py`.
-   Darin stecken u.a. Preis-/Einheitenregeln, Routing-Heuristiken und Bild-Sonderfaelle.
-   Das ist funktional entstanden, aber architektonisch nicht sauber getrennt.
-
-4. Der aktuelle Zustand ist nicht sauber versioniert.
-   `homelab-ai-bot/llm.py` ist lokal geaendert, aber nicht committed.
-   Der laufende Zustand und der Git-Stand sind also aktuell nicht identisch.
-
-5. Das Vision-Standardmodell ist derzeit `qwen3-vl:32b`.
-   Dieses Modell war auf der 3090 fuer Bot-Nutzung spuerbar zu langsam.
-   Das Routing kompensiert das aktuell durch Online-Ausnahmen, loest aber nicht die Grundarchitektur.
-
-## Was weiterhin funktioniert
- Tool-Loader und Handler-System funktionieren grundsaetzlich.
- Die meisten Tools sind modellunabhaengige Python-Handler und bleiben nutzbar:
-  - `web_search`
-  - `memory_*`
-  - `get_feed_stats`
-  - Proxmox / Loki / Grafana / Prometheus / Mail / SaveTV / Seafile / Tailscale / PBS / WordPress
- Goldpreis-Test ueber `web_search` + `gpt-4o-mini` lieferte plausibles Ergebnis statt Gramm/Unze-Verwechslung.
-
-## Was aktuell nicht als stabil gelten darf
- `deep_research`
- sauberes `local-first` Routing
- Preis-/Recherche-Routing als finale Architektur
- Bot-Verhalten bei weiteren Sonderfaellen ausserhalb des bisher getesteten Bereichs
-
-## Eigentliches Zielbild
- Standardaufgaben lokal
- Bild-/OCR-/Scraper-Aufgaben lokal
- Online nur fuer klar definierte Ausnahmen:
-  - Preisfragen
-  - Web-Recherche
-  - Deep Research
- Routing zentral und explizit im Code, nicht ueber gewachsene Prompt-Sonderregeln
-
-## Naechster sinnvoller Schritt
-Kein weiterer Quick-Fix.
-Stattdessen sauberer Umbau von `llm.py` in eine klare Routing-Architektur mit drei expliziten Pfaden:
-1. lokaler Standardpfad
-2. lokaler Vision-Pfad
-3. Online-Recherchepfad
+## Git-Stand
+Committed und nach Forgejo gepusht. Auto-Sync laeuft.
--- a/homelab-ai-bot/pycache/llm.cpython-311.pyc
+++ b/homelab-ai-bot/pycache/llm.cpython-311.pyc
--- a/homelab-ai-bot/llm.py
+++ b/homelab-ai-bot/llm.py
@ -18,14 +18,26 @@ log = logging.getLogger('llm')
 OLLAMA_BASE = "http://100.84.255.83:11434"
 OPENROUTER_BASE = "https://openrouter.ai/api/v1"

-MODEL = "openai/gpt-4o-mini"
-VISION_MODEL = "qwen3-vl:32b"
-FALLBACK_MODEL = "qwen3:30b-a3b"
+MODEL_LOCAL = "qwen3:30b-a3b"
+MODEL_VISION = "qwen3-vl:32b"
+MODEL_ONLINE = "perplexity/sonar"
+FALLBACK_MODEL = "qwen2.5:14b"
 MAX_TOOL_ROUNDS = 3
-OLLAMA_MODELS = {VISION_MODEL, FALLBACK_MODEL}
+OLLAMA_MODELS = {MODEL_LOCAL, MODEL_VISION, FALLBACK_MODEL}

 PASSTHROUGH_TOOLS = {"get_temperaturen", "get_energie", "get_heizung"}

+_WEB_TRIGGERS = [
+    "recherche", "recherchiere", "suche im internet", "web search",
+    "preis", "preise", "kostet", "kosten", "price",
+    "news", "nachrichten", "aktuell", "aktuelle",
+    "google", "finde heraus", "finde raus",
+    "gold", "silber", "kurs", "kurse",
+    "vergleich", "vergleiche",
+    "was kostet", "wie teuer", "wie viel",
+]
+_DEEP_TRIGGERS = ["deep research", "tiefenrecherche"]
+
 import datetime as _dt
 _TODAY = _dt.date.today()
 _3M_AGO = (_TODAY - _dt.timedelta(days=90))
@ -80,10 +92,6 @@ SESSION-RUECKBLICK:
 - Optional kurz erwaehnen was sonst noch Thema war.
 - session_search nur fuer Stichwort-Suche in ALTEN Sessions (nicht aktuelle).

-TOOL-ERGEBNISSE:
- Tool-Ausgaben sind bereits fertig formatiert (Umlaute, Einheiten, Struktur).
- Gib sie 1:1 wieder. NICHT umformulieren, kuerzen oder Umlaute ersetzen.
-
 BILDERKENNUNG — ALLGEMEIN:
 Wenn der User ein Bild schickt das KEIN kritisches Dokument ist (z.B. Foto, Screenshot, Landschaft):
 - Beschreibe strukturiert was du siehst.
@ -170,7 +178,6 @@ PREISRECHERCHE (PFLICHT):
 Wenn der User nach Preisen, Kosten oder Preisentwicklung fragt:
 - Nutze IMMER Tools statt Allgemeinwissen.
 - Fuer schnelle Preisabfrage: web_search.
- Auch wenn ein Bild mitgeschickt wird: Preise IMMER per web_search verifizieren — Bilder koennen veraltet sein.
 - Mache 2-3 gezielte web_search Aufrufe mit verschiedenen Suchbegriffen.
 - deep_research NUR wenn User explizit 'deep research' oder 'tiefenrecherche' sagt.
 - Gib konkrete Zahlen aus (EUR), nicht nur Tendenzen.
@ -194,8 +201,18 @@ def _get_api_key() -> str:
    return cfg.api_keys.get("openrouter_key", "")


+def _route_model(question: str) -> str:
+    """Entscheidet ob lokal, online (Sonar) oder deep_research."""
+    q = question.lower()
+    if any(t in q for t in _DEEP_TRIGGERS):
+        return "deep_research"
+    if any(t in q for t in _WEB_TRIGGERS):
+        return MODEL_ONLINE
+    return MODEL_LOCAL
+
+
 def _ollama_timeout_for(model: str) -> int:
-    if model == VISION_MODEL:
+    if model == MODEL_VISION:
        return 240
    if model == FALLBACK_MODEL:
        return 90
@ -203,6 +220,7 @@ def _ollama_timeout_for(model: str) -> int:


 def _add_no_think(messages: list) -> None:
+    """Haengt /no_think an die letzte User-Nachricht fuer Ollama."""
    for msg in reversed(messages):
        if msg.get("role") != "user":
            continue
@ -210,17 +228,17 @@ def _add_no_think(messages: list) -> None:
        if isinstance(content, str) and "/no_think" not in content:
            msg["content"] = content + " /no_think"
        elif isinstance(content, list):
-            for item in content:
-                if item.get("type") == "text" and "/no_think" not in item.get("text", ""):
-                    item["text"] = item["text"] + " /no_think"
+            for part in content:
+                if part.get("type") == "text" and "/no_think" not in part.get("text", ""):
+                    part["text"] = part["text"] + " /no_think"
                    break
        break


-def _call_openrouter(messages: list, api_key: str, use_tools: bool = True,
+def _call_api(messages: list, api_key: str, use_tools: bool = True,
              model: str = None, max_tokens: int = 4000,
              allow_fallback: bool = True) -> dict:
-    chosen = model or MODEL
+    chosen = model or MODEL_LOCAL
    use_ollama = chosen in OLLAMA_MODELS
    log.info("LLM-Call: model=%s ollama=%s max_tokens=%d", chosen, use_ollama, max_tokens)

@ -248,19 +266,14 @@ def _call_openrouter(messages: list, api_key: str, use_tools: bool = True,
        r.raise_for_status()
        return r.json()
    except requests.exceptions.ReadTimeout:
-        if use_ollama and allow_fallback and chosen == MODEL and FALLBACK_MODEL and FALLBACK_MODEL != chosen:
+        if use_ollama and allow_fallback and FALLBACK_MODEL and chosen != FALLBACK_MODEL:
            log.warning(
-                "Ollama timeout for %s after %ss, retrying with fallback model %s",
-                chosen,
-                timeout,
-                FALLBACK_MODEL,
+                "Ollama timeout for %s after %ss, retrying with %s",
+                chosen, timeout, FALLBACK_MODEL,
            )
-            return _call_openrouter(
-                messages,
-                api_key,
-                use_tools=use_tools,
-                model=FALLBACK_MODEL,
-                max_tokens=max_tokens,
+            return _call_api(
+                messages, api_key, use_tools=use_tools,
+                model=FALLBACK_MODEL, max_tokens=max_tokens,
                allow_fallback=False,
            )
        raise
@ -279,22 +292,38 @@ def ask(question: str, context: str) -> str:
        {"role": "user", "content": f"Kontext (Live-Daten):\n{context}\n\nFrage: {question}"},
    ]
    try:
-        data = _call_openrouter(messages, api_key, use_tools=False)
+        data = _call_api(messages, api_key, use_tools=False)
        return data["choices"][0]["message"]["content"]
    except Exception as e:
        return f"LLM-Fehler: {e}"


 def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -> str:
-    """Freitext-Frage mit automatischem Tool-Calling.
+    """Freitext-Frage mit automatischem Routing und Tool-Calling.

-    tool_handlers: dict von tool_name -> callable(**kwargs) -> str
-    session_id: aktive Session fuer Konversations-History
+    Routing:
+    - deep_research / tiefenrecherche -> Deep Research Handler direkt
+    - Web/Preis/Recherche -> Perplexity Sonar (kein Tool-Calling)
+    - Alles andere -> Lokales Modell mit allen Tools
    """
    api_key = _get_api_key()
    if not api_key:
        return "OpenRouter API Key fehlt in homelab.conf"

+    route = _route_model(question)
+
+    # --- Deep Research: direkt aufrufen, kein LLM noetig ---
+    if route == "deep_research":
+        log.info("Route: deep_research")
+        try:
+            from tools import deep_research
+            return deep_research.handle_deep_research(query=question)
+        except Exception as e:
+            return f"Deep Research Fehler: {e}"
+
+    log.info("Route: %s", route)
+
+    # --- Memory + Prompt aufbauen ---
    try:
        import memory_client
        memory_items = memory_client.get_relevant_memory(question, top_k=10)
@ -340,17 +369,35 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -

    messages.append({"role": "user", "content": question})

+    # --- Online (Sonar): kein Tool-Calling, Sonar sucht selbst ---
+    if route == MODEL_ONLINE:
+        try:
+            data = _call_api(messages, api_key, use_tools=False, model=MODEL_ONLINE)
+            content = data["choices"][0]["message"].get("content", "")
+            if session_id:
+                try:
+                    memory_client.log_message(session_id, "user", question)
+                    memory_client.log_message(session_id, "assistant", content)
+                except Exception:
+                    pass
+            return content or "Keine Antwort von Sonar."
+        except Exception as e:
+            return f"Online-Suche Fehler: {e}"
+
+    # --- Lokal: Tool-Calling mit allen Tools ---
    passthrough_result = None

    try:
        for _round in range(MAX_TOOL_ROUNDS):
-            data = _call_openrouter(messages, api_key, use_tools=True)
+            data = _call_api(messages, api_key, use_tools=True, model=MODEL_LOCAL)
            choice = data["choices"][0]
            msg = choice["message"]

            tool_calls = msg.get("tool_calls")
            if not tool_calls:
                content = msg.get("content") or ""
+                if not content and msg.get("reasoning"):
+                    content = msg.get("reasoning", "")
                if passthrough_result:
                    return passthrough_result
                return content or "Keine Antwort vom LLM."
@ -377,7 +424,7 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -
                result_str = str(result)[:3000]

                if fn_name in PASSTHROUGH_TOOLS and not result_str.startswith(("Fehler", "Keine")):
-                    log.info("Passthrough-Tool %s: Ergebnis wird direkt weitergegeben", fn_name)
+                    log.info("Passthrough-Tool %s", fn_name)
                    passthrough_result = result_str

                messages.append({
@ -388,7 +435,7 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -

        if passthrough_result:
            return passthrough_result
-        data = _call_openrouter(messages, api_key, use_tools=False)
+        data = _call_api(messages, api_key, use_tools=False, model=MODEL_LOCAL)
        return data["choices"][0]["message"]["content"]

    except Exception as e:
@ -396,7 +443,7 @@ def ask_with_tools(question: str, tool_handlers: dict, session_id: str = None) -


 def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session_id: str = None) -> str:
-    """Bild-Analyse mit optionalem Text und Tool-Calling via Vision-faehigem Modell."""
+    """Bild-Analyse via lokalem Vision-Modell mit Tool-Calling."""
    api_key = _get_api_key()
    if not api_key:
        return "OpenRouter API Key fehlt in homelab.conf"
@ -418,41 +465,6 @@ def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session
        "Wenn es ein normales Bild ist: Beschreibe strukturiert was du siehst."
    )
    prompt_text = caption if caption else default_prompt
-
-    _price_kw = ["preis", "kostet", "kosten", "price", "teuer", "guenstig", "billig",
-                 "bestpreis", "angebot", "euro", "eur", "kaufen", "gold", "silber",
-                 "unze", "ounce", "kurs", "wert", "ram", "ddr"]
-    _check_text = (caption or "").lower()
-    if not _check_text and session_id:
-        try:
-            import memory_client as _mc
-            _recent = _mc.get_session_messages(session_id, limit=3)
-            for _m in reversed(_recent):
-                if _m.get("role") == "user" and _m.get("content"):
-                    _check_text = _m["content"].lower()
-                    break
-        except Exception:
-            pass
-    _is_price_q = any(kw in _check_text for kw in _price_kw)
-    if _is_price_q:
-        prompt_text = (
-            "WICHTIG: Es geht um aktuelle Preise/Kurse. "
-            "Du MUSST ZUERST web_search aufrufen (kurze Keywords, z.B. goldpreis euro unze heute). "
-            "Fordere MINDESTENS 5 Ergebnisse an (max_results=5). "
-            "Das Bild ist NUR Kontext — Preise daraus NIEMALS als Antwort verwenden. "
-            "EINHEITEN-FALLE: goldpreis.de zeigt Preise PRO GRAMM, nicht pro Unze! "
-            "1 troy ounce = 31,103 Gramm. Wenn eine Quelle ~125 EUR zeigt und eine andere ~3.900 EUR, "
-            "dann ist 125 EUR der GRAMM-Preis und 3.900 EUR der UNZEN-Preis. "
-            "Nutze Quellen die explizit pro Unze oder per ounce schreiben (z.B. finanzen.net, boerse.de). "
-            "Erst NACH der web_search darfst du antworten.\n\n"
-            + prompt_text
-        )
-    else:
-        prompt_text += (
-            "\n\nHinweis: Wenn im Bild Preise oder Kurse sichtbar sind und der User "
-            "danach fragt, nutze web_search fuer aktuelle Werte statt die Bild-Daten."
-        )
-
    user_content = [
        {"type": "text", "text": prompt_text},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}", "detail": "high"}},
@ -481,14 +493,16 @@ def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session

    try:
        for _round in range(MAX_TOOL_ROUNDS):
-            data = _call_openrouter(messages, api_key, use_tools=True,
-                                    model=VISION_MODEL, max_tokens=4000)
+            data = _call_api(messages, api_key, use_tools=True,
+                             model=MODEL_VISION, max_tokens=4000)
            choice = data["choices"][0]
            msg = choice["message"]

            tool_calls = msg.get("tool_calls")
            if not tool_calls:
                content = msg.get("content") or ""
+                if not content and msg.get("reasoning"):
+                    content = msg.get("reasoning", "")
                return content or "Keine Antwort vom LLM."

            messages.append(msg)
@ -515,8 +529,8 @@ def ask_with_image(image_base64: str, caption: str, tool_handlers: dict, session
                    "content": str(result)[:3000],
                })

-        data = _call_openrouter(messages, api_key, use_tools=False,
-                               model=VISION_MODEL, max_tokens=4000)
+        data = _call_api(messages, api_key, use_tools=False,
+                         model=MODEL_VISION, max_tokens=4000)
        return data["choices"][0]["message"]["content"]

    except Exception as e:
--- a/homelab-ai-bot/tools/deep_research.py
+++ b/homelab-ai-bot/tools/deep_research.py
@ -26,7 +26,25 @@ QUALITAET BEI PREISFRAGEN:
 - Zeige Zeitraum, Preis damals/heute, Delta in % und Quellen.
 - Wenn keine belastbaren Daten vorhanden sind, sage es explizit."""

-TOOLS = []  # removed from auto-discovery; use HANDLERS directly
+TOOLS = [
+    {
+        "type": "function",
+        "function": {
+            "name": "deep_research",
+            "description": "KI-gestuetzte Tiefenrecherche (20-30 Quellen, 2-5 Min). NUR wenn User explizit deep research oder tiefenrecherche sagt.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {
+                        "type": "string",
+                        "description": "Die Recherche-Frage"
+                    }
+                },
+                "required": ["query"]
+            },
+        },
+    },
+]


 def _create_thread():