Downtime

Some services are down

Jan 04 at 07:01pm CET

Affected services

Website

API

Resolved
Jan 04 at 07:40pm CET

Incident Postmortem: Domain Accessibility Outage

Date: January 4, 2025

Duration: 29 minutes (07:01pm - 07:30pm CET)

Severity: High - Complete service unavailability

Affected Services: Website, API

Summary

On January 4th at 07:01pm CET, all services became inaccessible through our primary domain due to a configuration error introduced during routine maintenance. While the application backend remained operational throughout the incident, users were unable to access any services via the domain. Full service was restored at 07:30pm CET.

Timeline (CET)

07:01pm - Incident detected: Website and API reported as inaccessible
07:01pm - Incident created, investigation begun
07:20pm - Root cause identified: Configuration changes affecting domain routing
07:20pm - Status updated: Confirmed application running, issue isolated to domain access layer
07:30pm - Configuration corrected, services restored
07:30pm - Incident resolved, accessibility confirmed

Root Cause

During maintenance work to resolve Grafana monitoring dashboard accessibility issues, configuration changes were applied to the routing layer. These changes unintentionally affected the primary application domain routing rules, preventing all external access to services while the application itself continued running normally.

Impact

User Impact: All users unable to access website and API services for 29 minutes
Data Impact: None - no data loss or corruption occurred
System Impact: Application backend remained operational throughout

Resolution

The configuration changes were identified and reverted/corrected, restoring proper domain routing to all services. Access was verified and confirmed operational at 07:30pm CET.

Action Items

Immediate

Services restored to full operation

Short-term

Review and document proper configuration change procedures for routing rules
Implement configuration validation checks before applying changes
Establish rollback procedures for routing configuration changes

Long-term

Consider implementing staging environment for testing configuration changes
Add automated monitoring alerts for domain accessibility issues
Create runbook for domain routing troubleshooting

Lessons Learned

What Went Well

Issue was detected immediately
Root cause was identified within 20 minutes
Clear communication maintained via status page
Application stability was maintained throughout

What Could Be Improved

Configuration changes should be tested in isolation before production deployment
Need better safeguards to prevent routing changes from affecting multiple services simultaneously
Pre-deployment validation could have caught the misconfiguration

Updated
Jan 04 at 07:30pm CET

All services have been restored and are now fully accessible through our domain. The DNS/routing issue has been resolved and connectivity has been confirmed.

Updated
Jan 04 at 07:20pm CET

We are currently experiencing connectivity issues with our domain. The application itself is running normally, but users are unable to access services through our primary domain name.

Created
Jan 04 at 07:01pm CET

We are currently experiencing an outage affecting both our website and API services. Our team has been notified and is actively investigating the root cause.