Compare commits
No commits in common. "b091649e6a4f611269ee828dd51c8a01157c442f" and "b1aaaa9d57250bcffe53587b3a5e043711296138" have entirely different histories.
b091649e6a
...
b1aaaa9d57
5 changed files with 207 additions and 526 deletions
329
STATE.md
329
STATE.md
|
|
@ -1,5 +1,5 @@
|
|||
# STATE: Flugpreisscanner
|
||||
**Stand: 25.03.2026**
|
||||
**Stand: 26.02.2026**
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -11,9 +11,10 @@
|
|||
|------------|--------|
|
||||
| flugscanner-hub | ✅ Läuft (Docker: web + scheduler) |
|
||||
| flugscanner-asia | ✅ Läuft (Docker: agent + noVNC) |
|
||||
| flugscanner-mu | ⏸️ Disabled (DB) — Scraping nur Asia |
|
||||
| flugscanner-mu | ✅ Läuft (Docker: agent + noVNC) |
|
||||
| Forgejo-Repo | ✅ http://100.89.246.60:3000/orbitalo/flugpreisscanner |
|
||||
| Dashboard | ✅ http://100.92.161.97:8080 |
|
||||
| Telegram Bot | ✅ @CX_HKG_Alert_bot |
|
||||
| Telegram Bot | ✅ @CX_HKG_Alert_bot — Alerts + /preis + /best + /status |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -23,152 +24,205 @@ Täglich günstigste Flüge **FRA → KTI (Frankfurt → Phnom Penh)** automatis
|
|||
Kabine: **Economy** · Gepäck: 1 Koffer + Handgepäck · Aufenthalt: ~2 Monate
|
||||
Fokus: **Cathay Pacific (CX) via Hong Kong** — beste Preis-Leistung in Economy.
|
||||
KI wertet aus: jetzt buchen oder warten?
|
||||
Scraping läuft bewusst von Heimnetz-IPs — nicht von Hetzner (Datacenter-IPs werden geblockt).
|
||||
|
||||
**Route: 🇭🇰 HKG Stopover** — Multi-City FRA→HKG (1–2 Nächte) → KTI → FRA.
|
||||
Realistischer Preis: **900–1.050 EUR** Roundtrip Economy.
|
||||
|
||||
---
|
||||
|
||||
## Container & Zugänge
|
||||
## Container
|
||||
|
||||
| CT | Name | Server | Tailscale-IP | Zugang |
|
||||
|----|------|--------|--------------|--------|
|
||||
| 115 | `flugscanner-hub` | pve-hetzner | 100.92.161.97 | `ssh root@100.88.230.59` PW: Astral-Proxmox!2026 → `pct exec 115` |
|
||||
| 115 | `flugscanner-asia` | pve-ka-1 (Kambodscha) | 100.112.190.22 | `sshpass -p astral66 ssh root@100.122.56.60` → `pct exec 115` |
|
||||
| 145 | `flugscanner-mu` | helmut-pve (Muldenstein) | 100.75.182.15 | `sshpass -p astral66 ssh root@100.75.182.15` (direkt) |
|
||||
| CT | Name | Server | LAN-IP | Tailscale-IP | Aufgabe |
|
||||
|----|------|--------|--------|--------------|---------|
|
||||
| 115 | `flugscanner-hub` | pve-hetzner | 10.10.10.115 | 100.92.161.97 | Gehirn: Dashboard + Scheduler + KI-Auswertung (OpenRouter) + DB + Job-Koordination |
|
||||
| 115 | `flugscanner-asia` | pve1 Kambodscha | 192.168.0.131 | 100.112.190.22 | Scraping-Node A: SeleniumBase CDP + noVNC, Heimnetz-IP Asien |
|
||||
| 145 | `flugscanner-mu` | helmut-pve Muldenstein | 192.168.178.130 | 100.75.182.15 | Scraping-Node B: SeleniumBase CDP + noVNC, Heimnetz-IP Deutschland |
|
||||
|
||||
**Zugänge:**
|
||||
- Hub (pve-hetzner): `ssh root@100.88.230.59` PW: Astral-Proxmox!2026 → `pct exec 115`
|
||||
- Asia (pve1): `ssh root@192.168.0.197` PW: astral66 → `pct exec 115`
|
||||
- Muldenstein: `ssh root@100.75.182.15` PW: astral66 (direkt, kein pct nötig)
|
||||
- helmut-pve: `ssh root@100.87.235.11` PW: astral66
|
||||
|
||||
**Wichtig:**
|
||||
- Scraping läuft NIE von CT 115 / Hetzner aus
|
||||
- Kambodscha-Node überspringt Momondo/Traveloka (Geo-Block)
|
||||
- Muldenstein = deutsche IP (beste KAYAK/Momondo-Ergebnisse)
|
||||
- CT 115 koordiniert nur — die Nodes führen aus
|
||||
- Muldenstein = deutsche IP (beste Ergebnisse für Kayak, Momondo)
|
||||
- Kambodscha = asiatische IP (Momondo/Traveloka werden übersprungen — Geo-Block)
|
||||
- **Tailscale auf allen Containern** — sichere Kommunikation über Tailnet
|
||||
|
||||
---
|
||||
|
||||
## Pfade Hub (CT 115, pve-hetzner)
|
||||
## CT 115 — Flugpreisscanner Hub
|
||||
|
||||
**Nur Koordination, Auswertung, Dashboard — KEIN Scraping, KEIN noVNC hier.**
|
||||
|
||||
### Dienste (Docker)
|
||||
|
||||
| Service | Container | Port | Aufgabe |
|
||||
|---------|-----------|------|---------|
|
||||
| web | `flugscanner-web` | 8080 | Flask Dashboard |
|
||||
| scheduler | `flugscanner-scheduler` | — | Jobs verteilen, KI auslösen, Telegram Bot |
|
||||
|
||||
### Pfade
|
||||
|
||||
```
|
||||
/opt/flugscanner/hub/
|
||||
/opt/flugscanner/
|
||||
├── hub/
|
||||
│ ├── docker-compose.yml
|
||||
│ ├── .env
|
||||
│ ├── Dockerfile
|
||||
│ ├── data/
|
||||
│ │ └── flugscanner.db ← SQLite Datenbank
|
||||
│ └── src/
|
||||
│ ├── web.py ← Flask Dashboard + API
|
||||
│ ├── scheduler.py ← Job-Koordination + Telegram Bot
|
||||
│ ├── ki.py ← OpenRouter Auswertung + Plausibilität
|
||||
│ ├── db.py ← DB-Zugriff + Init
|
||||
│ └── requirements.txt
|
||||
└── node/ ← (auf Nodes ausgecheckt)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scraping-Nodes (asia + mu)
|
||||
|
||||
### Dienste (Docker)
|
||||
|
||||
| Service | Container | Port | Aufgabe |
|
||||
|---------|-----------|------|---------|
|
||||
| agent | `flugscanner-agent` | 5010 | Jobs empfangen, Selenium starten |
|
||||
| novnc | `flugscanner-novnc` | 6080 | Chrome live im Browser sehen |
|
||||
|
||||
### Pfade
|
||||
|
||||
```
|
||||
/opt/flugscanner/node/
|
||||
├── docker-compose.yml
|
||||
├── .env ← OPENROUTER_API_KEY, TELEGRAM_BOT_TOKEN
|
||||
├── data/
|
||||
│ └── flugscanner.db ← SQLite Datenbank
|
||||
├── .env ← NODE_NAME=flugscanner-asia/mu
|
||||
├── Dockerfile
|
||||
└── src/
|
||||
├── web.py ← Flask Dashboard + API
|
||||
├── scheduler.py ← Job-Koordination + Vision-KI + Telegram Bot
|
||||
├── ki.py ← OpenRouter KI-Plausibilität
|
||||
└── db.py ← DB-Zugriff + Init
|
||||
├── agent.py ← Flask API (POST /job, GET /status)
|
||||
├── worker.py ← SeleniumBase CDP Scraper
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## Pfade Nodes (asia + mu)
|
||||
### Kommunikation
|
||||
|
||||
```
|
||||
/opt/flugscanner/node/src/
|
||||
├── worker.py ← SeleniumBase CDP Scraper (alle Scanner)
|
||||
└── agent.py ← Flask API (POST /job, GET /status)
|
||||
Hub Scheduler → POST http://[Node-Tailscale-IP]:5010/job
|
||||
{ "scanner": "kayak_multicity", "von": "FRA", "nach": "KTI", "kabine": "economy", ... }
|
||||
|
||||
Node antwortet:
|
||||
{ "results": [...], "node": "flugscanner-mu", "count": 10, "screenshot_b64": "..." }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scanner
|
||||
|
||||
| Scanner | Status | Nodes | Anmerkung |
|
||||
|---------|--------|-------|-----------|
|
||||
| Kayak Roundtrip | ✅ Aktiv | beide | Beste Datenquelle |
|
||||
| Kayak Multi-City CX via HKG | ✅ Aktiv | beide | FRA→HKG→KTI→FRA |
|
||||
| Trip.com | ✅ Aktiv | beide | flightType=RT fix 21.03.2026 |
|
||||
| Momondo | ✅ Aktiv | nur mu | Geo-Block aus Asien |
|
||||
| Traveloka | ⚠ Nur mu | nur mu | Geo-Block aus Asien |
|
||||
| Google Flights | ⚠ Eingeschränkt | beide | Consent-Probleme |
|
||||
| Wego | ❌ Deaktiviert | — | |
|
||||
| Skyscanner | ❌ Deaktiviert | — | Bot-Detection |
|
||||
| Scanner | Status | Anmerkung |
|
||||
|---------|--------|-----------|
|
||||
| Kayak (Roundtrip) | ✅ Aktiv | Beste Datenquelle, GDPR-Consent automatisiert |
|
||||
| **Kayak Multi-City CX via HKG** | ✅ Aktiv | Primärer Scanner — FRA→HKG→KTI→FRA |
|
||||
| Trip.com | ✅ Aktiv | Gute Ergänzung, auch CX-Filter |
|
||||
| Momondo | ✅ Aktiv | Nur auf Muldenstein (Geo-Block aus Asien) |
|
||||
| Google Flights | ⚠ Eingeschränkt | Wenige Ergebnisse, Consent-Probleme |
|
||||
| Traveloka | ⚠ Nur Muldenstein | Geo-Block aus Asien |
|
||||
| Wego | ❌ Deaktiviert | |
|
||||
| Skyscanner | ❌ Deaktiviert | Bot-Detection |
|
||||
|
||||
### Node-spezifische Einschränkungen
|
||||
|
||||
Momondo und Traveloka werden auf `flugscanner-asia` automatisch übersprungen (Geo-Block).
|
||||
Konfiguration: `NODE_SCANNER_SKIP` in scheduler.py.
|
||||
|
||||
---
|
||||
|
||||
## KI-Pipeline
|
||||
## Anti-Bot-Strategie
|
||||
|
||||
### 1. KI-Augen (OpenRouter gpt-4o-mini)
|
||||
Screenshot-Analyse nach jedem Scan: PRICES_FOUND / COOKIE_BANNER / CAPTCHA / ERROR_PAGE
|
||||
|
||||
### 2. Kabinenklassen-Erkennung (OpenRouter gpt-4o-mini)
|
||||
Vision klassifiziert Economy / Economy Light / Premium Economy / Business
|
||||
|
||||
### 3. Vision-Preis-Lokal (Ollama qwen3-vl:32b — kostenlos)
|
||||
Nach jedem Scan: liest günstigsten **Roundtrip**-Preis aus Screenshot.
|
||||
- Abweichung ≤15% → `ki_verified=1`, `ki_preis_visual` gesetzt
|
||||
- Abweichung >15% → Original `plausibel=0`, neuer Eintrag `scanner=*_ki` mit KI-Preis
|
||||
|
||||
### 4. KI-Plausibilität (OpenRouter gpt-4o-mini)
|
||||
Batch-Prüfung aller neuen Preise gegen Erfahrungswerte.
|
||||
|
||||
### OpenRouter
|
||||
| Variable | Wert |
|
||||
|----------|------|
|
||||
| OPENROUTER_API_KEY | sk-or-v1-3c3... (aktualisiert 21.03.2026) |
|
||||
| AI_MODEL | openai/gpt-4o-mini |
|
||||
|
||||
---
|
||||
|
||||
## Datenbank — prices Tabelle (wichtige Felder)
|
||||
|
||||
| Feld | Bedeutung |
|
||||
|------|-----------|
|
||||
| preis | Scraper-Rohpreis |
|
||||
| plausibel | 1=ok, 0=Artefakt/One-Way/unplausibel, NULL=ungeprüft |
|
||||
| plausi_grund | Begründung |
|
||||
| ki_preis_visual | Von Vision-KI (qwen3-vl) gelesener Preis |
|
||||
| ki_verified | 1 = durch Vision-KI geprüft |
|
||||
| ki_verified_at | Zeitstempel der Verifikation |
|
||||
| preis_korrigiert | Preis + Gepäckzuschlag falls Economy Light |
|
||||
|
||||
---
|
||||
|
||||
## Preisreferenz (Stand 21.03.2026)
|
||||
|
||||
**FRA → KTI Roundtrip Economy, ~2 Monate Aufenthalt**
|
||||
|
||||
| Metrik | Wert |
|
||||
|--------|------|
|
||||
| Günstigster bestätigter Roundtrip | **870 EUR** (KAYAK, 01.03.2026) |
|
||||
| Realistischer Schnitt | 900–1.050 EUR |
|
||||
| Obergrenze plausibel | 1.400 EUR |
|
||||
| Unter 870 EUR | verdächtig (Artefakt oder One-Way) |
|
||||
|
||||
**Datenbasis (Stand 21.03.2026):**
|
||||
- 2.107 plausible Preise gesamt
|
||||
- 407 Sidebar-Artefakte bereinigt
|
||||
- 252 Trip.com One-Way-Preise bereinigt
|
||||
|
||||
---
|
||||
|
||||
## Bekannte Bugs & Fixes
|
||||
|
||||
| Datum | Bug | Fix |
|
||||
|-------|-----|-----|
|
||||
| 21.03.2026 | Trip.com lieferte One-Way statt Roundtrip | `flightType=RT` in URL ergänzt |
|
||||
| 21.03.2026 | KAYAK extrahierte Sidebar-Filterpreise | `_filter_sidebar_preise()` + Anker-Preis-Filter |
|
||||
| 21.03.2026 | Momondo Opodo-Popup blockierte Screenshots | `_dismiss_comparison_popup()` |
|
||||
| 21.03.2026 | Portainer logs als false positive in Loki | Loki-Filter erweitert |
|
||||
| 21.03.2026 | OpenRouter API Key abgelaufen | Neuer Key in .env |
|
||||
|
||||
---
|
||||
|
||||
## Scan-Zeiten (Hub)
|
||||
|
||||
| Zeit | Was |
|
||||
|------|-----|
|
||||
| 06:30 / 12:30 / 18:30 | Standard-Scan (alle Jobs, alle Nodes) |
|
||||
| 08:00 | Vorlauf-Scan (45/60/84 Tage vorab) |
|
||||
| 07:30 | Morgenbericht Telegram |
|
||||
| 20:00 | Tagesbilanz Telegram |
|
||||
- Scan-Intervall: zufällig **25–45 Minuten** (nicht regelmäßig)
|
||||
- SeleniumBase **UC/CDP Mode** (undetected Chromium)
|
||||
- GDPR-Consent automatisch wegklicken (Kayak, Momondo)
|
||||
- **Zwei verschiedene Geo-Locations** (Kambodscha + Deutschland)
|
||||
- Scrape-URL (.de) getrennt von Booking-URL (.com) — Nutzer sieht internationale Preise
|
||||
|
||||
---
|
||||
|
||||
## Telegram Bot
|
||||
|
||||
**Bot:** @CX_HKG_Alert_bot · **Chat-ID:** 674951792
|
||||
**Bot:** @CX_HKG_Alert_bot
|
||||
**Token:** `8693839370:AAEPG0t2gA5jkLFH3J8UmstZMkHPdp0aTG4`
|
||||
**Chat-ID:** 674951792
|
||||
|
||||
| Alert | Auslöser |
|
||||
|-------|----------|
|
||||
| Preis-Alert | CX < 900 EUR |
|
||||
| Preisanstieg | > 50 EUR Anstieg |
|
||||
| Scanner-Problem | 3x Null-Ergebnisse in Folge |
|
||||
### Befehle
|
||||
| Befehl | Funktion |
|
||||
|--------|----------|
|
||||
| /preis | Aktueller CX-Preis via HKG |
|
||||
| /best | Top 3 günstigste heute |
|
||||
| /status | Systemstatus (Nodes, letzte Scan-Zeit) |
|
||||
|
||||
### Automatische Nachrichten
|
||||
| Wann | Was |
|
||||
|------|-----|
|
||||
| Täglich 07:00 | Morgenbericht mit Preisübersicht |
|
||||
| Bei CX < 900€ | Preis-Alert |
|
||||
| Bei Anstieg > 50€ | Preisanstieg-Warnung |
|
||||
| Nach 3x Null-Ergebnissen | Scanner-Problem-Alert (pro Node) |
|
||||
|
||||
---
|
||||
|
||||
## Datenbank (SQLite auf CT 115)
|
||||
|
||||
Pfad: `/opt/flugscanner/hub/data/flugscanner.db`
|
||||
|
||||
| Tabelle | Inhalt |
|
||||
|---------|--------|
|
||||
| jobs | Geplante Scraping-Jobs (Route, Anbieter, Intervall, Airline-Filter) |
|
||||
| prices | Rohe Preisdaten (Preis, Datum, Anbieter, Node, Booking-URL, plausibel) |
|
||||
| screenshots | Vision-AI Screenshots mit Kabinenklassen-Erkennung |
|
||||
| analyses | KI-Auswertungen mit Timestamp |
|
||||
| prompts | Editierbare KI-Prompts |
|
||||
| nodes | Registrierte Scraping-Nodes + Status |
|
||||
| logs | System-Logs |
|
||||
|
||||
---
|
||||
|
||||
## KI-Auswertung
|
||||
|
||||
- Läuft automatisch nach jedem Scraping-Durchlauf
|
||||
- **Vision AI**: Screenshots werden per gpt-4o-mini klassifiziert (Economy/PE/Business)
|
||||
- **Plausibilitätsprüfung**: Preise 700–12.000€ für Economy Roundtrip
|
||||
- **Marktanalyse**: Prompt editierbar im Dashboard
|
||||
- OpenRouter Guthaben wird im Dashboard angezeigt
|
||||
|
||||
### OpenRouter
|
||||
|
||||
| Variable | Wert |
|
||||
|----------|------|
|
||||
| OPENROUTER_API_KEY | `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` |
|
||||
| AI_MODEL | `openai/gpt-4o-mini` |
|
||||
|
||||
---
|
||||
|
||||
## Preiserwartung (Stand 26.02.2026)
|
||||
|
||||
**FRA → HKG → Phnom Penh → FRA — Cathay Pacific Economy Roundtrip**
|
||||
|
||||
| Metrik | Wert |
|
||||
|--------|------|
|
||||
| Günstigster | ~726 EUR |
|
||||
| Realistischer Schnitt | **900–1.050 EUR** |
|
||||
| Gute Airlines (CX/SQ/TG) Durchschnitt | ~1.030 EUR |
|
||||
| Zum Vergleich: Reisebüro VA PE | ~2.000 EUR |
|
||||
|
||||
---
|
||||
|
||||
## Repo
|
||||
|
||||
`git.orbitalo.net/orbitalo/flugpreisscanner`
|
||||
API-Token (cursor-deploy-3): `a6dd1ee58e091c894169c5ae15f6b74bb9461c56`
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -177,38 +231,13 @@ Batch-Prüfung aller neuen Preise gegen Erfahrungswerte.
|
|||
| Datum | Was |
|
||||
|-------|-----|
|
||||
| 25.02.2026 | System live geschaltet |
|
||||
| 26.02.2026 | Umstellung PE → Economy, CX via HKG Hauptroute |
|
||||
| 26.02.2026 | Telegram Bot eingerichtet, Doku CT999 |
|
||||
| 21.03.2026 | OpenRouter API Key erneuert |
|
||||
| 21.03.2026 | Trip.com One-Way Bug gefixt (flightType=RT) |
|
||||
| 21.03.2026 | KAYAK Sidebar-Preis-Filter implementiert |
|
||||
| 21.03.2026 | Momondo Popup-Dismisser implementiert |
|
||||
| 21.03.2026 | Vision-KI Pipeline (qwen3-vl:32b lokal) eingebaut |
|
||||
| 21.03.2026 | DB: ki_preis_visual + ki_verified Spalten |
|
||||
| 21.03.2026 | Web-UI: KI-Verified Badge (grün/blau) |
|
||||
| 21.03.2026 | 407 Sidebar + 252 One-Way Preise bereinigt |
|
||||
| 21.03.2026 | Doku flugscanner-asia Zugang (via pve-ka-1) |
|
||||
|
||||
---
|
||||
|
||||
## Betrieb nur Kambodscha-Node (25.03.2026)
|
||||
|
||||
**Hintergrund:** Instabile Internetanbindung zum Scraping-Node in Deutschland (Muldenstein).
|
||||
|
||||
**Aenderung:** In `hub/data/flugscanner.db` → Tabelle `nodes`: **`flugscanner-mu`** auf `status='disabled'` gesetzt.
|
||||
Alle aktiven Scans laufen nur noch ueber **`flugscanner-asia`** (CT 115 auf **pve-ka-1**, Tailscale **100.112.190.22**) — Kambodscha / Phnom-Penh-Region.
|
||||
|
||||
**Auswirkung:**
|
||||
- **Momondo** und **Traveloka** werden nicht mehr ausgefuehrt (Geo-Block aus Asien; vorher nur sinnvoll ueber DE-IP).
|
||||
- **KAYAK**, **Trip.com**, eingeschraenkt **Google Flights** bleiben auf dem Asia-Node aktiv.
|
||||
|
||||
**MU-Node wieder einschalten** (wenn Verbindung stabil):
|
||||
```bash
|
||||
cd /opt/flugscanner/hub && docker compose stop
|
||||
sqlite3 data/flugscanner.db "UPDATE nodes SET status='online' WHERE name='flugscanner-mu';"
|
||||
docker compose start
|
||||
```
|
||||
|
||||
| Datum | Was |
|
||||
|-------|-----|
|
||||
| 25.03.2026 | flugscanner-mu disabled — Scraping nur noch Asia (Kambodscha) |
|
||||
| 25.02.2026 | Cookie-Banner-Fix + Screenshot-Verbesserungen |
|
||||
| 26.02.2026 | Umstellung PE → Economy, CX via HKG als Hauptroute |
|
||||
| 26.02.2026 | Telegram Bot @CX_HKG_Alert_bot eingerichtet |
|
||||
| 26.02.2026 | SeleniumBase 4.34 → 4.47 (CDP-Verbesserungen) |
|
||||
| 26.02.2026 | _scrape_url / _booking_url Trennung (Scrape .de, Booking .com) |
|
||||
| 26.02.2026 | GDPR-Consent-Handling für Kayak/Momondo |
|
||||
| 26.02.2026 | NODE_SCANNER_SKIP: Momondo/Traveloka auf Asia deaktiviert |
|
||||
| 26.02.2026 | Alert-Zähler jetzt pro Node (kein Spam durch Geo-Blocks) |
|
||||
| 26.02.2026 | SSH-Fix Muldenstein (PermitRootLogin yes) |
|
||||
| 26.02.2026 | Doku in CT999 ergänzt (ct-145-flugscanner-mu.md + index.md) |
|
||||
|
|
|
|||
|
|
@ -6,7 +6,6 @@ import threading
|
|||
import requests
|
||||
import schedule
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Optional
|
||||
from db import (init_db, get_conn, log, source_health_update,
|
||||
source_health_ist_pausiert, source_health_reset_daily,
|
||||
source_health_get_all, scan_result_save)
|
||||
|
|
@ -18,8 +17,6 @@ _vision_client = OpenAI(
|
|||
base_url="https://openrouter.ai/api/v1",
|
||||
api_key=os.environ.get("OPENROUTER_API_KEY")
|
||||
)
|
||||
OLLAMA_VISION_URL = "http://100.84.255.83:11434"
|
||||
|
||||
|
||||
# ── Telegram ──────────────────────────────────────────────────────────────────
|
||||
TELEGRAM_TOKEN = os.environ.get("TELEGRAM_BOT_TOKEN", "")
|
||||
|
|
@ -197,129 +194,6 @@ def klassifiziere_screenshot(screenshot_b64: str) -> str:
|
|||
return "Unbekannt"
|
||||
|
||||
|
||||
|
||||
def vision_preis_lokal(screenshot_b64: str) -> float | None:
|
||||
"""Vision-KI (gpt-4o-mini via OpenRouter) liest guenstigsten Roundtrip-Preis aus Screenshot.
|
||||
Frueher lokal (qwen3-vl:32b), jetzt Cloud — GPU frei fuer Coding-Agent."""
|
||||
if not screenshot_b64:
|
||||
return None
|
||||
try:
|
||||
prompt = (
|
||||
"Look at this flight search screenshot. "
|
||||
"I need the cheapest ROUNDTRIP (Hin- und Rueckflug) price in EUR from the search results. "
|
||||
"IMPORTANT: Ignore one-way (Hinflug only) prices. Ignore sidebar filters. Ignore ads. "
|
||||
"If the page shows roundtrip results: answer with the cheapest roundtrip price as a number only, e.g.: 872 "
|
||||
"If the page shows only one-way results or no roundtrip prices: answer 0"
|
||||
)
|
||||
response = _vision_client.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
max_tokens=30,
|
||||
temperature=0,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": prompt},
|
||||
{"type": "image_url", "image_url": {
|
||||
"url": f"data:image/jpeg;base64,{screenshot_b64}"
|
||||
}}
|
||||
]
|
||||
}]
|
||||
)
|
||||
txt = response.choices[0].message.content.strip()
|
||||
m = re.search(r'\d{3,5}', txt)
|
||||
if m:
|
||||
v = float(m.group(0))
|
||||
if 600 <= v <= 2500:
|
||||
log(f"Vision-Preis: {v:.0f}\u20ac erkannt (OpenRouter)")
|
||||
return v
|
||||
if v == 0:
|
||||
return None
|
||||
except Exception as e:
|
||||
log(f"Vision-Preis Fehler: {e}", "WARN")
|
||||
return None
|
||||
|
||||
def vision_verifiziere_preise(screenshot_b64: str, screenshot_id: int) -> Optional[float]:
|
||||
"""Verifiziert gespeicherte Preise via lokaler Vision-KI (qwen3-vl:32b).
|
||||
|
||||
Ablauf:
|
||||
- KI liest guenstigsten Preis aus Screenshot
|
||||
- Stimmt mit gespeichertem Preis ueberein (<=20% Abweichung): ki_verified=1 setzen
|
||||
- Weicht >20% ab (Sidebar-Artefakt): original plausibel=0 UND neuer Eintrag mit
|
||||
scanner='[scanner]_ki' und ki-Preis angelegt → erscheint als eigener Eintrag im UI
|
||||
"""
|
||||
if not screenshot_b64 or not screenshot_id:
|
||||
return None
|
||||
ki_preis = vision_preis_lokal(screenshot_b64)
|
||||
if ki_preis is None:
|
||||
return None
|
||||
try:
|
||||
from datetime import datetime as _dt
|
||||
conn = get_conn()
|
||||
# Alle Preise dieses Screenshots holen (mit Metadaten fuer neuen Eintrag)
|
||||
rows = conn.execute("""
|
||||
SELECT id, preis, scanner, node, job_id, waehrung, airline, abflug, ankunft,
|
||||
von, nach, booking_url, kabine_erkannt, preis_korrigiert, korrektur_grund
|
||||
FROM prices WHERE screenshot_id=?
|
||||
""", (screenshot_id,)).fetchall()
|
||||
|
||||
if not rows:
|
||||
conn.close()
|
||||
log(f"Vision-Check screenshot {screenshot_id}: keine Preiszeilen zum Abgleich")
|
||||
return None
|
||||
|
||||
now_str = _dt.now().isoformat()
|
||||
korrigiert = 0
|
||||
bestaetigt = 0
|
||||
|
||||
for row in rows:
|
||||
(price_id, raw_preis, scanner, node, job_id, waehrung, airline,
|
||||
abflug, ankunft, von, nach, booking_url, kabine, preis_korr, korr_grund) = row
|
||||
|
||||
diff_pct = abs(raw_preis - ki_preis) / ki_preis if ki_preis > 0 else 1
|
||||
|
||||
if diff_pct > 0.15:
|
||||
# Original als Artefakt markieren
|
||||
conn.execute("""
|
||||
UPDATE prices SET ki_preis_visual=?, ki_verified=0, ki_verified_at=NULL,
|
||||
plausibel=0,
|
||||
plausi_grund='Vision-KI: ' || ROUND(?) || 'EUR vs scraped ' || ROUND(?) || 'EUR (Artefakt — nicht verifiziert)'
|
||||
WHERE id=?
|
||||
""", (ki_preis, ki_preis, raw_preis, price_id))
|
||||
|
||||
# Neuer verifizierter Eintrag (scanner + '_ki')
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO prices
|
||||
(job_id, scanner, node, preis, waehrung, airline, abflug, ankunft,
|
||||
von, nach, booking_url, screenshot_id, kabine_erkannt,
|
||||
plausibel, plausi_grund, preis_korrigiert, korrektur_grund,
|
||||
ki_preis_visual, ki_verified, ki_verified_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 1,
|
||||
'Vision-KI verifiziert', ?, ?, ?, 1, ?)
|
||||
""", (
|
||||
job_id, scanner + '_ki', node, ki_preis, waehrung or 'EUR',
|
||||
airline, abflug, ankunft, von, nach, booking_url, screenshot_id,
|
||||
kabine, preis_korr, korr_grund, ki_preis, now_str,
|
||||
))
|
||||
conn.commit()
|
||||
log(f"Vision-Check: {raw_preis:.0f}€ → Artefakt; neuer Eintrag {ki_preis:.0f}€ ({scanner}_ki)")
|
||||
korrigiert += 1
|
||||
else:
|
||||
# Bestaetigt: nur Felder updaten, kein neuer Eintrag noetig
|
||||
conn.execute("""
|
||||
UPDATE prices SET ki_preis_visual=?, ki_verified=1, ki_verified_at=?
|
||||
WHERE id=?
|
||||
""", (ki_preis, now_str, price_id))
|
||||
conn.commit()
|
||||
bestaetigt += 1
|
||||
|
||||
conn.close()
|
||||
log(f"Vision-Check screenshot {screenshot_id}: {bestaetigt} bestaetigt, {korrigiert} korrigiert (KI: {ki_preis:.0f}€)")
|
||||
return ki_preis
|
||||
except Exception as e:
|
||||
log(f"Vision-Verifizierung Fehler: {e}", "WARN")
|
||||
return None
|
||||
|
||||
|
||||
# ── Cleanup ───────────────────────────────────────────────────────────────────
|
||||
def cleanup_alte_screenshots(tage=30):
|
||||
"""Löscht Screenshots die älter als `tage` Tage sind."""
|
||||
|
|
@ -501,29 +375,9 @@ def dispatch_job(node, job, tage_override=None):
|
|||
log(f"👁 KI-Fallback: {dropped} Preise verworfen (außerhalb {KI_FALLBACK_MIN}-{KI_FALLBACK_MAX}€ — vermutlich One-Way)")
|
||||
|
||||
try:
|
||||
pruefe_preis_alert(results, job)
|
||||
pruefe_preisanstieg(results, job)
|
||||
speichere_preise(results, node["name"], job, screenshot_id, kabine_erkannt)
|
||||
ki_vis = vision_verifiziere_preise(screenshot_b64, screenshot_id)
|
||||
# Telegram erst NACH Vision-Abgleich — Zahl = was die KI im Screenshot liest
|
||||
if screenshot_id and screenshot_b64:
|
||||
alert_rows = _alert_results_aus_db(screenshot_id, job_id)
|
||||
if alert_rows and _vision_hat_preise_verifiziert(screenshot_id):
|
||||
pruefe_preis_alert(alert_rows, job)
|
||||
pruefe_preisanstieg(alert_rows, job)
|
||||
elif alert_rows and ki_vis is None:
|
||||
log(
|
||||
f"📵 Preis-Alert übersprungen — Vision lieferte keinen Preis "
|
||||
f"({node['name']}/{job['scanner']})",
|
||||
"WARN",
|
||||
)
|
||||
elif alert_rows and not _vision_hat_preise_verifiziert(screenshot_id):
|
||||
log(
|
||||
f"📵 Preis-Alert übersprungen — keine ki_verified-Zeilen "
|
||||
f"(Screenshot {screenshot_id})",
|
||||
"WARN",
|
||||
)
|
||||
else:
|
||||
pruefe_preis_alert(results, job)
|
||||
pruefe_preisanstieg(results, job)
|
||||
except Exception as e:
|
||||
log(f"Speicher-Fehler {node['name']}/{job['scanner']}: {e}", "ERROR")
|
||||
return True
|
||||
|
|
@ -554,49 +408,6 @@ def speichere_screenshot(screenshot_b64, node_name, job):
|
|||
return None
|
||||
|
||||
|
||||
|
||||
def _vision_hat_preise_verifiziert(screenshot_id: int) -> bool:
|
||||
"""Mind. eine Zeile dieses Screenshots wurde mit Vision abgeglichen (ki_verified=1)."""
|
||||
conn = get_conn()
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) AS c FROM prices WHERE screenshot_id=? AND ki_verified=1",
|
||||
(screenshot_id,),
|
||||
).fetchone()
|
||||
conn.close()
|
||||
return bool(row and row["c"] and row["c"] > 0)
|
||||
|
||||
|
||||
def _alert_results_aus_db(screenshot_id: int, job_id: int) -> list:
|
||||
"""Preise fuer Telegram-Alert nach Vision: nur plausibel (NULL/1), mit Korrigierung."""
|
||||
conn = get_conn()
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT preis, preis_korrigiert, abflug, ankunft, booking_url, scanner
|
||||
FROM prices
|
||||
WHERE screenshot_id = ? AND job_id = ?
|
||||
AND (plausibel IS NULL OR plausibel = 1)
|
||||
ORDER BY COALESCE(preis_korrigiert, preis) ASC
|
||||
""",
|
||||
(screenshot_id, job_id),
|
||||
).fetchall()
|
||||
conn.close()
|
||||
out = []
|
||||
for r in rows:
|
||||
raw = float(r["preis"] or 0)
|
||||
pk = r["preis_korrigiert"]
|
||||
eff = float(pk) if pk is not None else raw
|
||||
if eff <= 0:
|
||||
continue
|
||||
out.append({
|
||||
"preis": eff,
|
||||
"abflug": r["abflug"] or "",
|
||||
"ankunft": r["ankunft"] or "",
|
||||
"booking_url": r["booking_url"] or "",
|
||||
"scanner": r["scanner"] or "",
|
||||
})
|
||||
return out
|
||||
|
||||
|
||||
ALERT_SCHWELLE_EUR = 900 # Telegram-Alert wenn CX unter diesen Preis fällt
|
||||
|
||||
def pruefe_preis_alert(results, job):
|
||||
|
|
@ -616,23 +427,6 @@ def pruefe_preis_alert(results, job):
|
|||
f"⚠️ Sofort auf Buchungsseite prüfen — Preise ändern sich schnell."
|
||||
)
|
||||
log(f"💰 PREIS-ALERT: {preis:.0f}EUR {scanner} — Telegram gesendet")
|
||||
# memory_service_event
|
||||
try:
|
||||
import requests as _rq, json as _json
|
||||
_rq.post("http://100.121.192.94:8400/events", json={
|
||||
"source": "flugscanner",
|
||||
"event_type": "price_alert",
|
||||
"object_key": f"{scanner}_{abflug}_{preis:.0f}",
|
||||
"payload_json": _json.dumps({
|
||||
"scanner": scanner,
|
||||
"preis_eur": preis,
|
||||
"abflug": abflug,
|
||||
"booking_url": url,
|
||||
"schwelle": ALERT_SCHWELLE_EUR,
|
||||
}, ensure_ascii=False),
|
||||
}, headers={"Authorization": "Bearer Ai8eeQibV6Z1RWc7oNPim4PXB4vILU1nRW2-XgRcX2M"}, timeout=3)
|
||||
except Exception:
|
||||
pass
|
||||
break
|
||||
|
||||
|
||||
|
|
@ -701,7 +495,7 @@ def speichere_preise(results, node_name, job, screenshot_id=None, kabine_erkannt
|
|||
continue
|
||||
|
||||
conn.execute("""
|
||||
INSERT OR IGNORE INTO prices
|
||||
INSERT INTO prices
|
||||
(job_id, scanner, node, preis, waehrung, airline, abflug, ankunft,
|
||||
von, nach, booking_url, screenshot_id, kabine_erkannt,
|
||||
plausibel, plausi_grund, preis_korrigiert, korrektur_grund)
|
||||
|
|
|
|||
|
|
@ -338,28 +338,14 @@ async function ladeUebersicht() {
|
|||
const buchBtn = p.booking_url
|
||||
? `<a href="${p.booking_url}" target="_blank" class="btn btn-sm" style="text-decoration:none">Öffnen ↗</a>`
|
||||
: '—';
|
||||
// KI-Verifizierung: entweder _ki-Eintrag (korrigiert) oder ki_verified=1 (bestaetigt)
|
||||
const isKiKorrigiert = p.scanner && p.scanner.endsWith('_ki');
|
||||
const isKiBestaetigt = !isKiKorrigiert && p.ki_verified == 1;
|
||||
const isKiVerified = isKiKorrigiert || isKiBestaetigt;
|
||||
const scannerBase = isKiKorrigiert ? p.scanner.slice(0, -3) : p.scanner;
|
||||
const kiLabel = isKiKorrigiert
|
||||
? `<br><span title="Preis durch Vision-KI korrigiert (war Sidebar-Artefakt)" style="background:#1e3a5f;color:#60a5fa;padding:0.1rem 0.4rem;border-radius:3px;font-size:0.7rem;cursor:help">👁 KI-korrigiert</span>`
|
||||
: isKiBestaetigt
|
||||
? `<br><span title="Preis durch lokale Vision-KI (qwen3-vl) bestätigt${p.ki_preis_visual ? ': KI sieht ' + p.ki_preis_visual + '€' : ''}" style="background:#052e16;color:#4ade80;padding:0.1rem 0.4rem;border-radius:3px;font-size:0.7rem;cursor:help">👁 KI ✓</span>`
|
||||
: '';
|
||||
const scannerLabel = isMulticity
|
||||
? `<strong style="color:#818cf8">🇭🇰 HKG Stopover</strong><br><span style="font-size:0.72rem;color:#64748b">+~${HOTEL_HKG}€ Hotel</span>`
|
||||
: (scannerBase + kiLabel);
|
||||
: p.scanner;
|
||||
const verdaechtig = (ps === 0);
|
||||
const preisFarbe = verdaechtig ? '#ef4444' : (isMulticity ? '#a78bfa' : isKiKorrigiert ? '#60a5fa' : '#34d399');
|
||||
// Bei bestaetigten Eintraegen: ki_preis_visual anzeigen wenn vorhanden und abweichend
|
||||
const kiPreisHinweis = (isKiBestaetigt && p.ki_preis_visual && Math.abs(p.ki_preis_visual - p.preis) > 5)
|
||||
? `<br><span style="font-size:0.7rem;color:#94a3b8">KI sieht: ${p.ki_preis_visual}€</span>`
|
||||
: '';
|
||||
const preisFarbe = verdaechtig ? '#ef4444' : (isMulticity ? '#a78bfa' : '#34d399');
|
||||
const gesamtHtml = isMulticity
|
||||
? `<strong style="color:${preisFarbe}">${p.preis} €</strong><br><span style="font-size:0.75rem;color:#64748b">∑ ~${Math.round(p.preis)+HOTEL_HKG} € inkl. Hotel</span>`
|
||||
: `<strong style="color:${preisFarbe}">${p.preis} €</strong>${kiPreisHinweis}`;
|
||||
: `<strong style="color:${preisFarbe}">${p.preis} €</strong>`;
|
||||
const ssBtn = p.screenshot_id
|
||||
? `<button onclick="zeigeScreenshot(${p.screenshot_id},'${p.scanner} · ${p.node} · ${p.abflug||''}')"
|
||||
style="background:#1e3a5f;border:1px solid #2563eb;color:#93c5fd;padding:0.2rem 0.5rem;border-radius:5px;cursor:pointer;font-size:0.8rem">
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ def job():
|
|||
tage = data.get("tage", 30)
|
||||
aufenthalt = data.get("aufenthalt_tage", 60)
|
||||
trip_type = data.get("trip_type", "roundtrip")
|
||||
kabine = data.get("kabine", "economy")
|
||||
kabine = data.get("kabine", "premium_economy")
|
||||
gepaeck = data.get("gepaeck", "1koffer+handgepaeck")
|
||||
airline_filter = data.get("airline_filter", "")
|
||||
layover_min = data.get("layover_min", 120)
|
||||
|
|
|
|||
|
|
@ -36,8 +36,8 @@ def _validate_results(results, scanner_name, kabine="economy"):
|
|||
return results
|
||||
|
||||
|
||||
def _check_cabin_on_page(body, title, kabine="economy"):
|
||||
"""Prüft ob die Seite die gewünschte Kabinenklasse grob bestätigt."""
|
||||
def _check_cabin_on_page(body, title, kabine="premium_economy"):
|
||||
"""Prüft ob die Seite die gewünschte Kabinenklasse bestätigt."""
|
||||
text = (title + " " + body[:3000]).lower()
|
||||
if kabine == "premium_economy":
|
||||
pe_keywords = ["premium economy", "premium eco", "premiumeconomy",
|
||||
|
|
@ -49,9 +49,6 @@ def _check_cabin_on_page(body, title, kabine="economy"):
|
|||
if eco_only[0]:
|
||||
print("[QC] WARNUNG: Seite zeigt 'Economy' OHNE 'Premium' — möglicherweise falsche Kabine!")
|
||||
return False
|
||||
elif kabine == "economy":
|
||||
if "business" in text and "economy" not in text[:800]:
|
||||
print("[QC] WARNUNG: Seite evtl. nur Business sichtbar — prüfen")
|
||||
return True
|
||||
|
||||
|
||||
|
|
@ -82,7 +79,7 @@ def _filter_roundtrip_only(results):
|
|||
|
||||
|
||||
def scrape(scanner, von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2,
|
||||
|
|
@ -137,68 +134,6 @@ def _dismiss_cookie_banner(sb):
|
|||
return False
|
||||
|
||||
|
||||
|
||||
def _dismiss_comparison_popup(sb):
|
||||
"""Vergleichs-Popups (Opodo, Skyscanner etc.) wegklicken bevor Screenshot gemacht wird."""
|
||||
# Erst Escape versuchen (funktioniert bei den meisten Modals)
|
||||
try:
|
||||
sb.driver.execute_script("document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape', keyCode: 27, bubbles: true}));")
|
||||
sb.sleep(0.5)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Dann gezielt Close-Buttons suchen
|
||||
for sel in [
|
||||
'button[aria-label*="lose"]',
|
||||
'button[aria-label*="chließen"]',
|
||||
'button[aria-label*="Schließen"]',
|
||||
'[class*="modal"] button[class*="close"]',
|
||||
'[class*="dialog"] button[class*="close"]',
|
||||
'[class*="overlay"] button[class*="close"]',
|
||||
'[class*="popup"] button[class*="close"]',
|
||||
'button[class*="dismiss"]',
|
||||
'[data-testid*="close"]',
|
||||
'//button[contains(@aria-label, "lose")]',
|
||||
'//button[contains(., "Schließen")]',
|
||||
'//button[contains(., "Nein")]',
|
||||
'//button[contains(., "Nicht jetzt")]',
|
||||
'//button[contains(., "Vielleicht später")]',
|
||||
]:
|
||||
try:
|
||||
sb.click(sel, timeout=1)
|
||||
print(f"[Popup] Geschlossen: {sel[:60]}")
|
||||
sb.sleep(0.8)
|
||||
return True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# JavaScript-Fallback: alle sichtbaren Modals/Overlays entfernen
|
||||
try:
|
||||
removed = sb.driver.execute_script("""
|
||||
var removed = 0;
|
||||
var selectors = ['[class*="modal"]', '[class*="overlay"]', '[class*="dialog"]',
|
||||
'[class*="popup"]', '[role="dialog"]'];
|
||||
selectors.forEach(function(sel) {
|
||||
document.querySelectorAll(sel).forEach(function(el) {
|
||||
var style = window.getComputedStyle(el);
|
||||
if (style.display !== 'none' && style.visibility !== 'hidden'
|
||||
&& el.offsetHeight > 100) {
|
||||
el.remove();
|
||||
removed++;
|
||||
}
|
||||
});
|
||||
});
|
||||
return removed;
|
||||
""")
|
||||
if removed:
|
||||
print(f"[Popup] JS: {removed} Elemente entfernt")
|
||||
sb.sleep(0.5)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def _take_screenshot(sb):
|
||||
"""Full-Page Screenshot via CDP (JPEG 55%, max 3000px). Gibt base64-String zurück."""
|
||||
try:
|
||||
|
|
@ -281,9 +216,7 @@ def _booking_url_momondo(von, nach, abflug, rueck, kc, bags=1,
|
|||
|
||||
|
||||
def _booking_url_trip(von, nach, abflug_fmt, rueck_fmt, kc, von_name, nach_name, airline=""):
|
||||
# flightType=RT erzwingt Roundtrip-Suche auf Trip.com
|
||||
flight_type = "RT" if rueck_fmt else "OW"
|
||||
params = f"DDate1={abflug_fmt}&class={kc}&curr=EUR&flightType={flight_type}"
|
||||
params = f"DDate1={abflug_fmt}&class={kc}&curr=EUR"
|
||||
if rueck_fmt:
|
||||
params += f"&DDate2={rueck_fmt}"
|
||||
if airline:
|
||||
|
|
@ -334,54 +267,6 @@ def _preise_aus_body(body, scanner, abflug):
|
|||
return results[:10]
|
||||
|
||||
|
||||
|
||||
def _kayak_header_preis(sb) -> float | None:
|
||||
"""Liest den 'Günstigste Option' Preis aus dem KAYAK-Summary-Header.
|
||||
Dieser Wert ist der zuverlässigste Anker — kommt direkt aus den Suchergebnissen."""
|
||||
try:
|
||||
# JavaScript: suche die summary-bar Elemente
|
||||
price = sb.driver.execute_script("""
|
||||
// KAYAK zeigt "Günstigste Option" + Preis in einem summary-container
|
||||
var containers = document.querySelectorAll('[class*="rec-col"], [class*="recommended"], [class*="summary"], [class*="option-header"]');
|
||||
for (var c of containers) {
|
||||
var txt = c.innerText || '';
|
||||
var m = txt.match(/(\d[\d.]{1,6})\s?€|€\s?(\d[\d.]{1,6})/);
|
||||
if (m) {
|
||||
var raw = (m[1] || m[2]).replace('.','').replace(',','.');
|
||||
var v = parseFloat(raw);
|
||||
if (v > 300 && v < 5000) return v;
|
||||
}
|
||||
}
|
||||
// Fallback: suche im Seitentitel / h1
|
||||
var h = document.querySelector('h1, [class*="title"]');
|
||||
if (h) {
|
||||
var m2 = (h.innerText||'').match(/(\d[\d.]{2,6})\s?€/);
|
||||
if (m2) return parseFloat(m2[1].replace('.',''));
|
||||
}
|
||||
return null;
|
||||
""")
|
||||
if price:
|
||||
print(f"[KY] Header-Preis: {price} EUR")
|
||||
return float(price)
|
||||
except Exception as e:
|
||||
print(f"[KY] Header-Preis Fehler: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def _filter_sidebar_preise(results: list, anker: float | None, scanner: str) -> list:
|
||||
"""Filtert Sidebar-Preise (Airline-Filter, Preisslider) heraus.
|
||||
Behalte nur Preise die >= 80% des Anker-Preises sind (Sidebar-Preise sind viel günstiger)."""
|
||||
if not anker or not results:
|
||||
return results
|
||||
min_valid = anker * 0.80
|
||||
filtered = [r for r in results if r["preis"] >= min_valid]
|
||||
removed = len(results) - len(filtered)
|
||||
if removed:
|
||||
print(f"[{scanner}] {removed} Sidebar-Preise entfernt (unter {min_valid:.0f} EUR)")
|
||||
return filtered if filtered else results # Fallback: alle behalten wenn alle rausgefiltert
|
||||
|
||||
|
||||
|
||||
def _consent_google(sb):
|
||||
"""Google Consent-Seite (DSGVO) behandeln."""
|
||||
if "consent" in sb.get_current_url() or "Bevor Sie" in sb.get_title():
|
||||
|
|
@ -433,7 +318,7 @@ def _gf_fill_field(sb, selectors, text, field_name):
|
|||
|
||||
|
||||
def scrape_google_flights(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
|
|
@ -441,7 +326,7 @@ def scrape_google_flights(von, nach, tage=30, aufenthalt_tage=60,
|
|||
abflug_de = (datetime.now() + timedelta(days=tage)).strftime("%d.%m.%Y")
|
||||
rueck = (datetime.now() + timedelta(days=tage + aufenthalt_tage)).strftime("%Y-%m-%d") \
|
||||
if trip_type == "roundtrip" else ""
|
||||
kc = KABINE_GOOGLE.get(kabine, "e")
|
||||
kc = KABINE_GOOGLE.get(kabine, "w")
|
||||
booking_url = _booking_url_google(von, nach, abflug, rueck, kc)
|
||||
|
||||
stadtname = {"FRA": "Frankfurt", "HAN": "Hanoi", "KTI": "Phnom Penh",
|
||||
|
|
@ -669,19 +554,18 @@ def scrape_google_flights(von, nach, tage=30, aufenthalt_tage=60,
|
|||
results = dedup
|
||||
|
||||
print(f"[GF] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
||||
def scrape_kayak(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
abflug = (datetime.now() + timedelta(days=tage)).strftime("%Y-%m-%d")
|
||||
rueck = (datetime.now() + timedelta(days=tage + aufenthalt_tage)).strftime("%Y-%m-%d") if trip_type == "roundtrip" else ""
|
||||
kc = KABINE_KAYAK.get(kabine, "e")
|
||||
kc = KABINE_KAYAK.get(kabine, "w")
|
||||
bags = 1 if "koffer" in gepaeck else 0
|
||||
booking_url = _booking_url_kayak(von, nach, abflug, rueck, kc, bags,
|
||||
layover_min, layover_max, airline_filter,
|
||||
|
|
@ -727,24 +611,20 @@ def scrape_kayak(von, nach, tage=30, aufenthalt_tage=60,
|
|||
results.append(r)
|
||||
|
||||
# Kabinen-Verifikation: prüfe ob "Premium Economy" in der Seite steht
|
||||
pe_confirmed = _check_cabin_on_page(body, title, kabine)
|
||||
pe_confirmed = _check_cabin_on_page(body, title, "premium_economy")
|
||||
if not pe_confirmed:
|
||||
print(f"[KY{airline_label}] WARNUNG: Premium Economy nicht auf Seite bestätigt!")
|
||||
|
||||
# Sidebar-Preise herausfiltern: Header-Preis als Ankerwert holen
|
||||
anker = _kayak_header_preis(sb)
|
||||
results = _filter_sidebar_preise(results, anker, f"kayak{airline_label}")
|
||||
results = _validate_results(results, f"kayak{airline_label}", kabine)
|
||||
print(f"[KY{airline_label}] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_cookie_banner(sb)
|
||||
sb.sleep(3)
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
||||
def scrape_trip(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
|
|
@ -752,7 +632,7 @@ def scrape_trip(von, nach, tage=30, aufenthalt_tage=60,
|
|||
rueck_fmt = (datetime.now() + timedelta(days=tage + aufenthalt_tage)).strftime("%Y%m%d") if trip_type == "roundtrip" else ""
|
||||
abflug_iso = (datetime.now() + timedelta(days=tage)).strftime("%Y-%m-%d")
|
||||
rueck_iso = (datetime.now() + timedelta(days=tage + aufenthalt_tage)).strftime("%Y-%m-%d") if trip_type == "roundtrip" else ""
|
||||
kc = KABINE_TRIP.get(kabine, "Y")
|
||||
kc = KABINE_TRIP.get(kabine, "W")
|
||||
|
||||
stadtname = {"FRA": "frankfurt", "HAN": "hanoi", "KTI": "phnom-penh",
|
||||
"PNH": "phnom-penh", "BKK": "bangkok", "SGN": "ho-chi-minh-city"}
|
||||
|
|
@ -804,7 +684,7 @@ def scrape_trip(von, nach, tage=30, aufenthalt_tage=60,
|
|||
r["booking_url"] = booking_url
|
||||
results.append(r)
|
||||
|
||||
pe_confirmed = _check_cabin_on_page(body, title, kabine)
|
||||
pe_confirmed = _check_cabin_on_page(body, title, "premium_economy")
|
||||
if not pe_confirmed:
|
||||
print("[TR] WARNUNG: Premium Economy nicht auf Seite bestätigt!")
|
||||
|
||||
|
|
@ -812,7 +692,6 @@ def scrape_trip(von, nach, tage=30, aufenthalt_tage=60,
|
|||
print(f"[TR] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_cookie_banner(sb)
|
||||
sb.sleep(2)
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
|
@ -837,7 +716,7 @@ def _booking_url_kayak_multicity(von, nach, via, abflug, via_datum, rueck, kc, b
|
|||
|
||||
|
||||
def scrape_kayak_multicity(von, nach, tage=30, aufenthalt_tage=60,
|
||||
kabine="economy",
|
||||
kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck",
|
||||
airline_filter="",
|
||||
via="HKG", stopover_min_h=20, stopover_max_h=30):
|
||||
|
|
@ -848,7 +727,7 @@ def scrape_kayak_multicity(von, nach, tage=30, aufenthalt_tage=60,
|
|||
abflug = (datetime.now() + timedelta(days=tage)).strftime("%Y-%m-%d")
|
||||
via_datum = (datetime.now() + timedelta(days=tage + 1)).strftime("%Y-%m-%d")
|
||||
rueck = (datetime.now() + timedelta(days=tage + 1 + aufenthalt_tage)).strftime("%Y-%m-%d")
|
||||
kc = KABINE_KAYAK.get(kabine, "e")
|
||||
kc = KABINE_KAYAK.get(kabine, "w")
|
||||
bags = 1 if "koffer" in gepaeck else 0
|
||||
airline_label = f" [{airline_filter}]" if airline_filter else ""
|
||||
|
||||
|
|
@ -904,13 +783,12 @@ def scrape_kayak_multicity(von, nach, tage=30, aufenthalt_tage=60,
|
|||
print(f"[MC{airline_label}] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_cookie_banner(sb)
|
||||
sb.sleep(3)
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
||||
def scrape_momondo(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
|
|
@ -918,7 +796,7 @@ def scrape_momondo(von, nach, tage=30, aufenthalt_tage=60,
|
|||
abflug = (datetime.now() + timedelta(days=tage)).strftime("%Y-%m-%d")
|
||||
rueck = (datetime.now() + timedelta(days=tage + aufenthalt_tage)).strftime("%Y-%m-%d") \
|
||||
if trip_type == "roundtrip" else ""
|
||||
kc = KABINE_KAYAK.get(kabine, "e")
|
||||
kc = KABINE_KAYAK.get(kabine, "w")
|
||||
bags = 1 if "koffer" in gepaeck else 0
|
||||
booking_url = _booking_url_momondo(von, nach, abflug, rueck, kc, bags,
|
||||
layover_min, layover_max, airline_filter,
|
||||
|
|
@ -977,24 +855,20 @@ def scrape_momondo(von, nach, tage=30, aufenthalt_tage=60,
|
|||
r["airline"] = airline_filter or ""
|
||||
results.append(r)
|
||||
|
||||
pe_confirmed = _check_cabin_on_page(body, title, kabine)
|
||||
pe_confirmed = _check_cabin_on_page(body, title, "premium_economy")
|
||||
if not pe_confirmed:
|
||||
print(f"[MO{airline_label}] WARNUNG: Premium Economy nicht auf Seite bestätigt!")
|
||||
|
||||
# Sidebar-Preise herausfiltern
|
||||
anker_mo = _kayak_header_preis(sb) # Momondo hat gleiches Layout wie Kayak
|
||||
results = _filter_sidebar_preise(results, anker_mo, f"momondo{airline_label}")
|
||||
results = _validate_results(results, f"momondo{airline_label}", kabine)
|
||||
print(f"[MO{airline_label}] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_cookie_banner(sb)
|
||||
sb.sleep(2)
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
||||
def scrape_wego(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
|
|
@ -1005,7 +879,7 @@ def scrape_wego(von, nach, tage=30, aufenthalt_tage=60,
|
|||
|
||||
KABINE_WEGO = {"economy": "economy", "premium_economy": "premiumEconomy",
|
||||
"business": "business", "first": "first"}
|
||||
kc = KABINE_WEGO.get(kabine, "economy")
|
||||
kc = KABINE_WEGO.get(kabine, "premiumEconomy")
|
||||
|
||||
stadtname_wego = {"FRA": "frankfurt", "KTI": "phnom-penh", "HAN": "hanoi",
|
||||
"BKK": "bangkok", "SGN": "ho-chi-minh-city", "HKG": "hong-kong"}
|
||||
|
|
@ -1057,7 +931,6 @@ def scrape_wego(von, nach, tage=30, aufenthalt_tage=60,
|
|||
results.append(r)
|
||||
|
||||
print(f"[WG] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
|
@ -1081,7 +954,7 @@ def _parse_preis_usd(text):
|
|||
|
||||
|
||||
def scrape_traveloka(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
|
|
@ -1136,13 +1009,12 @@ def scrape_traveloka(von, nach, tage=30, aufenthalt_tage=60,
|
|||
results.sort(key=lambda x: x["preis"])
|
||||
results = _validate_results(results, "traveloka", "premium_economy")
|
||||
print(f"[TV] Ergebnis: {[r['preis'] for r in results[:5]]}")
|
||||
_dismiss_comparison_popup(sb)
|
||||
screenshot_b64 = _take_screenshot(sb)
|
||||
return results[:10], screenshot_b64
|
||||
|
||||
|
||||
def scrape_skyscanner(von, nach, tage=30, aufenthalt_tage=60,
|
||||
trip_type="roundtrip", kabine="economy",
|
||||
trip_type="roundtrip", kabine="premium_economy",
|
||||
gepaeck="1koffer+handgepaeck", airline_filter="",
|
||||
layover_min=120, layover_max=300,
|
||||
max_flugzeit_h=22, max_stops=2):
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue