Slow server response times & timeouts

Incident Report for Uptick

Postmortem

On Monday 21st October, 2019 we experienced heavily degraded server performance across some of our enterprise customers.

The outage lasted approximately 2 hours, with servers responding erratically and returning intermittent 502 errors over the duration.

The issue was caused by one of our cloud database servers, relied on by our high-volume customers, becoming overloaded and unable to respond to higher-than-usual traffic. Our monitoring systems failed to report any anomalies, which meant it took us longer to diagnose and identify the cause of the degraded performance.

We’ve substantially upgraded the resources available to this database server which has restored operations.

We will immediately investigate our monitoring system and put in place measures to prevent this type of failure from happening again in the future, as well as implementing a higher level of isolation, to prevent issues of this nature from manifesting as broadly.

To those affected, we apologise for this significant window of downtime/degraded performance, and we’ll be contacting you directly with followup measures and apologies.

Thank you for your patience,

Posted Oct 21, 2019 - 13:00 AEDT

Resolved

All systems operational. Stay tuned for a postmortem, running through specifics of the recent outage.

Posted Oct 21, 2019 - 12:44 AEDT

Monitoring

Servers appear to be running smoothly again now, we'll continue monitoring more closely for the next hour.

Posted Oct 21, 2019 - 11:38 AEDT

Identified

We're performing an emergency database upgrade now. There will be complete server downtime for affected customers for several minutes. We'll be back soon.

Posted Oct 21, 2019 - 11:32 AEDT

Update

We are continuing to investigate this issue.

Posted Oct 21, 2019 - 11:30 AEDT

Update

We believe the cause of this morning's outages is related to our database infrastructure, possibly linked to a routine maintenance upgrade that occurred over the weekend. We've applied some fixes, and are seeing some improved performance, but we're continuing to investigate.

Posted Oct 21, 2019 - 10:49 AEDT

Update

We are continuing to investigate this issue.

Posted Oct 21, 2019 - 09:52 AEDT

Investigating

We're experiencing some shaky server stability this morning. Looking into it as a matter of urgency. Will keep you posted!

Posted Oct 21, 2019 - 09:52 AEDT

This incident affected: Uptick Web, Uptick Mobile App, and Uptick Customer Portal.