Zaptec Portal delay

Incident Report for Zaptec

Postmortem

On October 9, 2025, we experienced our customer portal showing the stale data and then disruptions to charging operations. We want to provide a transparent overview of what happened, how we responded, and the steps we are taking to prevent this from happening again. 

## Timeline of Events

13:45: Replication issues detected between database instances. It was growing slowly but steadily and initially did not caught attention as a transient issue.

14:27: Our engineering team started to recycle most loaded backend systems to clean up potentially hanging database connections. No improvement observed.

14:40: High-traffic API endpoints temporarily disabled to reduce load. No improvement observed.

15:00: Web services temporarily shut down. The underlying problem persisted despite reduced load.

15:00: Charging control plane restarted. Load decreased, but the event backlog continued to grow.

15:10: Web services restored. Event backlog remained unchanged.

15:37: Portal services shut down. Event backlog remained unchanged.

16:03: Incident escalated to emergency status.

16:15: Online detection services temporarily shut down. Event backlog cleared and returned to normal levels.

16:30: All systems stabilized and resumed normal operations.

Root Cause Analysis

Our cloud database service stopped shipping transaction logs, causing the secondary database replica to fall behind. This occurred despite adequate resources being available to handle the incoming transaction volume and is clearly visible on the respective server metric.

We are still investigating the underlying cause of the log shipping failure. Current areas of investigation include:

Cloud infrastructure networking issues

Long-running database transactions caused by application behavior

We are actively working to reproduce the issue to better understand the failure conditions.

Preventative Measures and Follow-up

To prevent similar incidents and improve our response capabilities, we are implementing the following measures:

Infrastructure optimization: Reducing workload on our database infrastructure through architectural improvements.

Enhanced monitoring: Deploying additional observability tools to detect replication issues earlier.

Automated alerting: New alerts configured for replication lag and transaction log queue sizes to enable faster detection and response.

We are committed to continuous improvement and will provide updates as our investigation progresses.

Posted Oct 09, 2025 - 13:03 CEST

Resolved

This incident has been resolved, postmortems will be posted later today.
Posted Oct 09, 2025 - 08:01 CEST

Update

We are continuing to monitor for any further issues.
Posted Oct 09, 2025 - 07:07 CEST

Update

We are continuing to monitor for any further issues.
Posted Oct 08, 2025 - 22:28 CEST

Update

We are continuing to monitor for any further issues.
Posted Oct 08, 2025 - 16:53 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 08, 2025 - 16:45 CEST

Identified

The issue has been identified and a fix is being implemented.
Posted Oct 08, 2025 - 16:42 CEST

Update

We’re seeing some improvements, but issues are still under investigation
Posted Oct 08, 2025 - 16:33 CEST

Update

We are continuing to investigate this issue.
Posted Oct 08, 2025 - 16:18 CEST

Update

There is still delays in Zaptec Portal, this will affect adding new chargers in installations, also Zaptec App
We are continuing to investigate the issue.
Posted Oct 08, 2025 - 16:07 CEST

Update

Portal is up again, but we are still seeing delays. We are continuing to investigate the issue.
Posted Oct 08, 2025 - 15:56 CEST

Update

The portal will be unavailable for approximately 15 minutes.
Posted Oct 08, 2025 - 15:40 CEST

Update

There are still delays in the Zaptec Portal after the restart, and we are continuing to investigate the issue.
Posted Oct 08, 2025 - 15:16 CEST

Update

The portal will be temporarily unavailable due to a restart.
Posted Oct 08, 2025 - 15:03 CEST

Update

We are continuing to investigate this issue.
Posted Oct 08, 2025 - 14:53 CEST

Update

We’re currently experiencing issues with activating and deactivating chargers in the portal.
There’s also some delay in allocating current to the chargers.
We are investigating the issue!
Posted Oct 08, 2025 - 14:24 CEST

Investigating

We are currently investigating this issue.
Posted Oct 08, 2025 - 14:18 CEST
This incident affected: Zaptec Cloud Services (Portal, API, Charger backend, OCPP) and API, Portal.