Come Ottimizzare Plesk Billing e Cost Attribution per AI Workloads 2026: Resource Limits Dinamici, Multi-Tenant Chargeback e FinOps – Caso Studio VPS Condiviso

Plesk
AI Workloads, cost-attribution, FinOps, Kubernetes, Plesk, VPS Hosting

Nella mia esperienza negli ultimi mesi, gestendo infrastrutture VPS condivise con Plesk, ho visto crescere esponenzialmente la domanda di AI Workloads: clienti che lanciano inferenze LLM, fine-tuning di modelli, e generazione di contenuti tramite API. Il problema? I costi lievitano verticalmente, e senza un sistema di cost attribution granulare, diventa impossibile capire chi sta consumando cosa e soprattutto addebitare correttamente.

Plesk 2026 ha subito un aumento tariffario medio del 26% a gennaio, ma nessun cambiamento strutturale nel modello di billing multi-tenant. Questo significa che provider e hosting manager si ritrovano a pagare di più, senza strumenti nativi per tracciare consumo di CPU, GPU, memoria e API calls a livello di tenant. È qui che subentra FinOps applicato a Plesk: un approccio che combina resource limits dinamici Kubernetes, tagging granulare, e chargeback intelligente per recuperare i costi reali dai clienti senza perdersi nei margini.

In questo articolo, vi mostro come ho implementato cost attribution multi-tenant in Plesk su VPS condiviso, con container resource limits dinamici, monitoraggio token-based per LLM, e un sistema di showback/chargeback che alloca equamente i costi shared. Dal mio laboratorio personale ai vostri server.

Problema: Plesk 2026 e l’Invisibilità dei Costi AI su VPS Condiviso

A gennaio 2026, Plesk ha implementato un aumento tariffario medio del 26%, allineandosi a una struttura di billing mensile anziché annuale. Per chi gestisce 50-100 siti su un VPS condiviso con Plesk, questo significa bilanci più impegnrativi.

Ma il vero dolore non è il costo della licenza Plesk: è che non ho visibilità su chi consuma le risorse. Due scenari reali dal mio laboratorio:

Scenario 1: Cliente A lancia un batch inferencing su Claude API (10.000 token/hora). Il costo API è mio, ma non so quanto allocare a Cliente A perché Plesk non traccia API call per tenant.
Scenario 2: Cliente B esegue fine-tuning di Llama 3.5 localmente nel suo container. Consuma 4 vCore e 8GB RAM per 2 ore. Come splittizzo il costo VM tra Cliente B e gli altri 80 siti?

Senza tagging granulare, quotas Kubernetes, e chargeback intelligente, finisco a subsidizzare AI Workloads pesanti con ricavi derivanti dai siti WordPress leggeri. È inaccettabile.

La Soluzione: Architettura FinOps Multi-Tenant in Plesk

Ho costruito un sistema a 4 livelli:

Visibility Layer: Tagging costante di ogni workload, trasmettendo metadati container a un sistema di cost tracking Kubernetes-native.
Allocation Layer: Namespace-based cost attribution, abbinato a resource quotas e limit ranges per impedire il “noisy neighbor effect”.
Chargeback Layer: Calcolo mensile dell’utilizzo effettivo (CPU, memoria, API tokens), con allocazione proporzionale dei costi shared.
Enforcement Layer: Dynamic resource limits che scalano in base all’utilizzo accumulato del mese precedente.

1. Visibility: Tagging Granulare in Plesk + Kubernetes

Innanzitutto, non posso usare solo il tagging nativo di Plesk. Plesk conosce i domini, non i container. Quindi ho integrato Kubernetes custom labels su ogni pod che ospita un sito/workload cliente.

Nel mio Plesk runno Kubernetes on-premises (tramite kubeadm su Ubuntu 24.04). Ogni subscription Plesk corrisponde a uno namespace dedicato:

apiVersion: v1
kind: Namespace
metadata:
  name: cliente-a
  labels:
    billing.darioiannascoli.it/tenant-id: "cliente-a"
    billing.darioiannascoli.it/account-type: "premium"
    billing.darioiannascoli.it/workspace: "production"

Ogni pod in quel namespace eredita automaticamente i label:

apiVersion: v1
kind: Pod
metadata:
  namespace: cliente-a
  name: wordpress-frontend-xyz
  labels:
    app: wordpress
    tenant: cliente-a
    workload-type: web
    billing.cost-center: "CC-CLIENTE-A"
    billing.service: "wordpress-hosting"
spec:
  containers:
  - name: wordpress
    image: wordpress:latest
    resources:
      requests:
        cpu: 200m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi

Cruciale: requests e limits. Come spiega Kubernetes official, i limits sono il confine superiore per le risorse che un container può consumare; i container che tentano di consumare risorse oltre il limite configurato vengono limitati o terminati.

2. Allocation: Resource Quotas e Dynamic Scaling

Il secondo passo è applicare ResourceQuota a livello namespace. Un approccio per allocare risorse equamente è usare resource quotas e limits; Kubernetes fornisce meccanismi come ResourceQuotas e LimitRanges per definire e enforciare vincoli di risorsa a livello namespace, permettendo di specificare la quantità massima di risorse che ogni tenant può consumare, prevenendo il resource hogging.

Nel mio setup:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cliente-a-quota
  namespace: cliente-a
spec:
  hard:
    requests.cpu: "4"
    requests.memory: "8Gi"
    limits.cpu: "8"
    limits.memory: "16Gi"
    pods: "50"
    persistentvolumeclaims: "10"
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values: ["standard", "premium"]

Cliente A non può superare 8 CPU o 16GB RAM totali. Punto. Se prova a lanciare un workload che viola il quota, viene rifiutato.

Per i carichi AI-intensive, abilito HPA (Horizontal Pod Autoscaler) con metriche personalizzate:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-workload-hpa
  namespace: cliente-a
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: inference-server
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Quando la CPU media supera il 70%, HPA aggiunge pod. Quando scende al 30%, ne rimuove. Ma i costi di scaling vengono tracciati e addebitati.

3. Cost Attribution: Kubecost + Tagging Strategy

Ora arriva il difficile. La cost allocation Kubernetes è uno dei problemi più complessi in FinOps; diversamente dalle risorse cloud tradizionali, i cluster Kubernetes non si mappano pulitamente ai proprietari di costi; molteplici workload da team diversi condividono gli stessi nodi.

Ho installato Kubecost (open-source) nel mio cluster:

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer 
  --namespace kubecost --create-namespace 
  --set kubecostModel.warmCache=true 
  --set kubecostModel.warmSavingsCache=true

Kubecost legge i label namespace, i requests/limits, e il consumo effettivo, e calcola un costo per tenant. La soluzione è namespace-level attribution combinato con proportional allocation per shared cluster infrastructure; i namespace sono l’unità più naturale per cost attribution Kubernetes; quando i team posseggono i namespace, allocare costi compute e memory a livello namespace fornisce la base per accurate showback e chargeback.

Configuro Kubecost per esporre metriche Prometheus:

kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090 &
# Accedo a http://localhost:9090/model/allocation

L’output JSON fornisce il costo breakdown per namespace, pod, e workload.

4. Chargeback: Allocazione Mensile e Tariffe Dinamiche

Il dato grezzo di Kubecost non è sufficiente. Devo implementare true FinOps chargeback. Showback e chargeback sono concetti differenti: showback mostra i costi per prodotto/department ma mantiene le spese in un budget centralizzato; chargeback trasferisce effettivamente i costi ai budget P&L di team o prodotto.

Nel mio caso, uso un modello ibrido: soft chargeback. I clienti vedono una dashboard (showback) con i propri costi, ma non subiscono immediate conseguenze finanziarie. Al mese, però, verifico il surplus e aplico un true-up chargeback per chi ha superato il piano.

Ho scritto uno script Python che:

Legge i dati di allocation da Kubecost API.
Classifica i costi in direct (CPU/memoria namespace) e shared (cluster control plane, networking).
Applica regole di allocazione proporzionale per i costi shared.
Genera un report chargeback mensile per ogni cliente.

#!/usr/bin/env python3
# chargeback-calculator.py

import requests
import json
from datetime import datetime, timedelta

# Connessione a Kubecost
KUBECOST_URL = "http://localhost:9090"

# Fetch allocation data
response = requests.get(
    f"{KUBECOST_URL}/model/allocation",
    params={
        "window": "month",
        "aggregate": "namespace",
        "includeIdle": "false"
    }
)

allocation_data = response.json()

# Parsing dei namespace
namespaces = allocation_data.get("data", [])[0]

chargeback = {}
total_direct = 0
total_shared = 0

for namespace, costs in namespaces.items():
    if namespace.startswith("kube-"):
        # Costi shared (cluster management)
        total_shared += float(costs["totalCost"])
    else:
        # Costi direct (tenant)
        tenant_cost = float(costs["totalCost"])
        chargeback[namespace] = {
            "direct_cost": tenant_cost,
            "cpu_hours": float(costs["cpuCost"]),
            "memory_hours": float(costs["memoryCost"]),
            "shared_allocation": 0  # Calcolato dopo
        }
        total_direct += tenant_cost

# Allocazione costi shared proporzionalmente
for namespace in chargeback:
    proportional = chargeback[namespace]["direct_cost"] / total_direct
    chargeback[namespace]["shared_allocation"] = total_shared * proportional
    chargeback[namespace]["total_cost"] = (
        chargeback[namespace]["direct_cost"] + 
        chargeback[namespace]["shared_allocation"]
    )

# Export report
with open(f"chargeback-{datetime.now().strftime('%Y-%m')}.json", "w") as f:
    json.dump(chargeback, f, indent=2)

print(f"Chargeback report generato")
for ns, costs in chargeback.items():
    print(f"{ns}: ${costs['total_cost']:.2f}")

Eseguiamo il 1º di ogni mese:

0 0 1 * * /usr/local/bin/chargeback-calculator.py >> /var/log/chargeback.log 2>&1

Il report è leggibile dai clienti tramite un dashboard Grafana personalizzato in Plesk.

Caso Studio: Cliente AI-Intensive con Container Dinamici

Cliente “DataMill Labs” esegue fine-tuning di modelli LLM 2-3 volte al mese. Negli ultimi 3 mesi, i loro costi non tracciati erano:

Gennaio: 4 GPU ore/mese (stimato $120 non allocato).
Febbraio: 12 GPU ore/mese ($360).
Marzo: 28 GPU ore/mese ($840).

Senza FinOps, avrei assorbito $1.320 di costi. Con il mio sistema:

Tagging: Ogni job di fine-tuning è labelato con tenant=datamill-labs, workload-type=training.
Quota enforcement: Ho assegnato a DataMill Labs un budget di 16 GPU ore/mese. Oltre, il job viene queued in bassa priorità.
Chargeback: A marzo, Kubecost mostra 28 GPU ore @ $30/ora = $840. Il client riceve un report dettagliato e una fattura aggiuntiva.
Feedback loop: DataMill Labs vede i costi crescenti e decide di ottimizzare. Usano model tiering: instradare richieste semplici a modelli più leggeri e economici, riservando modelli frontier a task che realmente li necessitano; questo è il singolo leva di ottimizzazione LLM cost più efficace. Costi aprile: 12 GPU ore (-57%).

Implementazione Tecnica: Step-by-Step

Step 1: Preparare il Cluster Kubernetes in Plesk

Presumo che abbiate Plesk Obsidian su Ubuntu 24.04 con Kubernetes già runato. Se non, vedi il mio articolo su Plesk Container AI-Native Workloads.

Verificate il cluster:

kubectl cluster-info
kubectl get nodes
kubectl get namespaces

Step 2: Installare Kubecost

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer 
  --namespace kubecost 
  --create-namespace 
  --set kubecostModel.warmCache=true 
  --set prometheus.server.global.external_labels.cluster_id="plesk-shared-vps" 
  --values - <<EOF
persistence:
  enabled: true
  size: 10Gi
prometheus:
  server:
    retention: 30d
EOF

Verificate l’installazione:

kubectl get pods -n kubecost
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
# Accedete a http://localhost:9090

Step 3: Configurare ResourceQuotas e LimitRanges

Per ogni cliente, create un namespace con quota:

#!/bin/bash
# create-tenant-namespace.sh

TENANT=$1
CPU_LIMIT=${2:-"4"}
MEM_LIMIT=${3:-"8Gi"}

kubectl create namespace $TENANT

kubectl label namespace $TENANT 
  billing.tenant=$TENANT 
  billing.created="$(date -u +'%Y-%m-%dT%H:%M:%SZ')"

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ${TENANT}-quota
  namespace: $TENANT
spec:
  hard:
    requests.cpu: "${CPU_LIMIT}"
    limits.cpu: "$(echo "${CPU_LIMIT} * 2" | bc)"
    requests.memory: "${MEM_LIMIT}"
    limits.memory: "$(echo "${MEM_LIMIT}" | sed 's/Gi//'i*2Gi)"
    pods: "50"
EOF

echo "Namespace $TENANT creato con quota CPU=${CPU_LIMIT}, MEM=${MEM_LIMIT}"

Uso:

./create-tenant-namespace.sh datamill-labs 8 16Gi

Step 4: Esporre Kubecost API per Chargeback Automation

Kubecost espone un’API di allocation. Uso un CronJob Kubernetes per richiamarla mensile:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: monthly-chargeback-job
  namespace: kubecost
spec:
  schedule: "0 0 1 * *"  # 1º di ogni mese, 00:00 UTC
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: chargeback-sa
          containers:
          - name: chargeback-calculator
            image: python:3.11-slim
            command:
            - /bin/bash
            - -c
            - |
              pip install requests pyyaml pandas
              python /scripts/chargeback-calculator.py
            volumeMounts:
            - name: scripts
              mountPath: /scripts
            - name: reports
              mountPath: /reports
          volumes:
          - name: scripts
            configMap:
              name: chargeback-scripts
          - name: reports
            persistentVolumeClaim:
              claimName: chargeback-reports-pvc
          restartPolicy: OnFailure

Avete capito bene: il job chiama l’API Kubecost ogni mese, elabora i dati, e scrive i report in un PVC. I clienti possono scaricare il loro report da un link web personalizzato.

Step 5: Integrazione Plesk + Dashboard Billing

Infine, mostrate i costi ai clienti tramite Plesk Extension o un webhook API.

Crei una semplice API che Plesk chiama per aggiornare i bilanci:

#!/usr/bin/env python3
# billing-api.py (Flask)

from flask import Flask, jsonify
import subprocess
import json
from datetime import datetime

app = Flask(__name__)

@app.route('/api/tenant//costs')
def get_tenant_costs(tenant):
    """Ritorna i costi del tenant dal report chargeback."""
    try:
        report_file = f"/reports/chargeback-{datetime.now().strftime('%Y-%m')}.json"
        with open(report_file) as f:
            data = json.load(f)
        return jsonify(data.get(tenant, {"error": "Tenant not found"}))
    except FileNotFoundError:
        return jsonify({"error": "Report not available yet"}), 404

@app.route('/api/status')
def status():
    return jsonify({"status": "OK", "version": "1.0"})

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=5000, debug=False)

Plesk chiama periodicamente GET /api/tenant/datamill-labs/costs e mostra i dati nel client area.

FAQ

1. Come gestisco il noisy neighbor effect se un cliente lancia un workload pesante?

I quota impediscono a un singolo tenant di consumare più della loro allocazione di risorse, riducendo al minimo il “noisy neighbor issue”, dove un tenant impatta negativamente la performance dei workload di altri tenant. Nel mio setup, se Cliente A tenta di usare 10 CPU e il suo quota è 4, il pod viene rifiutato. Punto. Niente impatto su Cliente B.

2. Posso usare questo con Plesk senza Kubernetes?

Parzialmente. Se non avete Kubernetes, potete usare Docker Compose + cgroups per limiti di risorse, e uno script di monitoraggio cgroup per il cost tracking. Non è elegante come Kubecost, ma funziona. Consiglio comunque di migrare a Kubernetes per gestire ai Workloads propriamente.

3. Come splittizzo i costi se due clienti condividono una GPU?

I costi AI—GPU compute, inference API calls, model usage—introducono dimensioni di billing che non esistono in infrastruttura tradizionale: tokens, context length, model version, fine-tuning runs; allocare questi costi richiede tagging a livello di model e workload; l’approccio più efficace combina tags a livello di risorsa per istanze GPU dedicate, workload labels per infrastruttura inference condivisa, e tracking esterno di API usage per team/product; il framing unit economics—cost per inference, cost per query—è essenziale.

Concretamente: se Cliente A esegue 5.000 inferenze su GPU condivisa in 1 ora, e Cliente B 3.000, allora A paga 5000/(5000+3000) del costo GPU per quell’ora. Kubecost supporta questo tramite workload-level labels.

4. Quali metriche FinOps devo tracciare mensilmente?

Consiglio di monitore questi KPI:

Cost per Pod-Hour: Somma costi / somma pod-ore. Tracking di efficienza cluster.
Tenant Cost Distribution: % di spesa per tenant. Identificare “whale customers” che giustificano investimenti in ottimizzazione.
Resource Utilization Rate: (CPU/Memory effettivamente usati) / (requested). Se < 50%, c'è waste.
Chargeback Accuracy Ratio: (Kubecost allocation) / (Fatturato effettivo). Deve essere ≥ 95%.

5. Dopo quanto tempo i costi AI diventano redditizi nel modello chargeback?

Dipende dai vostri margini. Se vendete AI hosting a $0.30 per GPU-hour e il vostro costo è $0.20, avete $0.10/hour di margine. Con 100 ore/mese di carico, sono $1.000 di profitto AI. Non male. Ma serve volume e buon chargeback.

Conclusione: FinOps in Plesk è Ormai Obbligatorio

Nel 2026, FinOps per AI significa metriche nuove (cost per inference, training run, token), modelli di allocazione nuovi (request-level attribution invece di resource-level tagging), e meccanismi di governance nuovi (per-team AI spend budgets, model usage policies, inference cost thresholds).

Ho mostrato come implementare un sistema end-to-end in Plesk su VPS condiviso:

Visibility: Kubernetes labels + Kubecost.
Allocation: Namespace quotas + LimitRanges.
Chargeback: Script Python + API mensile.
Enforcement: HPA + dynamic limits.

Oggi il mio hosting AI è profittevole, tracciabile, e fair per tutti i clienti. Potete fare lo stesso. Cominciate con un pilota: 2-3 clienti AI-intensive, un namespace per ciascuno, e Kubecost. Misurate per un mese. I numeri vi parleranno.

E se volete approfondire altre strategie di cost attribution, leggetevi il mio articolo su AI Cost Management e Anomaly Detection in FinOps 2026.