Back to overview
Downtime

Some services are down

Jan 04 at 07:01pm CET
Affected services
Website
API

Resolved
Jan 04 at 07:40pm CET

Incident Postmortem: Domain Accessibility Outage

Date: January 4, 2025

Duration: 29 minutes (07:01pm - 07:30pm CET)

Severity: High - Complete service unavailability

Affected Services: Website, API


Summary

On January 4th at 07:01pm CET, all services became inaccessible through our primary domain due to a configuration error introduced during routine maintenance. While the application backend remained operational throughout the incident, users were unable to access any services via the domain. Full service was restored at 07:30pm CET.


Timeline (CET)

  • 07:01pm - Incident detected: Website and API reported as inaccessible
  • 07:01pm - Incident created, investigation begun
  • 07:20pm - Root cause identified: Configuration changes affecting domain routing
  • 07:20pm - Status updated: Confirmed application running, issue isolated to domain access layer
  • 07:30pm - Configuration corrected, services restored
  • 07:30pm - Incident resolved, accessibility confirmed

Root Cause

During maintenance work to resolve Grafana monitoring dashboard accessibility issues, configuration changes were applied to the routing layer. These changes unintentionally affected the primary application domain routing rules, preventing all external access to services while the application itself continued running normally.


Impact

  • User Impact: All users unable to access website and API services for 29 minutes
  • Data Impact: None - no data loss or corruption occurred
  • System Impact: Application backend remained operational throughout

Resolution

The configuration changes were identified and reverted/corrected, restoring proper domain routing to all services. Access was verified and confirmed operational at 07:30pm CET.


Action Items

Immediate

  • Services restored to full operation

Short-term

  • Review and document proper configuration change procedures for routing rules
  • Implement configuration validation checks before applying changes
  • Establish rollback procedures for routing configuration changes

Long-term

  • Consider implementing staging environment for testing configuration changes
  • Add automated monitoring alerts for domain accessibility issues
  • Create runbook for domain routing troubleshooting

Lessons Learned

What Went Well

  • Issue was detected immediately
  • Root cause was identified within 20 minutes
  • Clear communication maintained via status page
  • Application stability was maintained throughout

What Could Be Improved

  • Configuration changes should be tested in isolation before production deployment
  • Need better safeguards to prevent routing changes from affecting multiple services simultaneously
  • Pre-deployment validation could have caught the misconfiguration

Updated
Jan 04 at 07:30pm CET

All services have been restored and are now fully accessible through our domain. The DNS/routing issue has been resolved and connectivity has been confirmed.

Updated
Jan 04 at 07:20pm CET

We are currently experiencing connectivity issues with our domain. The application itself is running normally, but users are unable to access services through our primary domain name.

Created
Jan 04 at 07:01pm CET

We are currently experiencing an outage affecting both our website and API services. Our team has been notified and is actively investigating the root cause.