Slow server response times & timeouts
Incident Report for Uptick
Postmortem

On Monday 21st October, 2019 we experienced heavily degraded server performance across some of our enterprise customers.

The outage lasted approximately 2 hours, with servers responding erratically and returning intermittent 502 errors over the duration.

The issue was caused by one of our cloud database servers, relied on by our high-volume customers, becoming overloaded and unable to respond to higher-than-usual traffic. Our monitoring systems failed to report any anomalies, which meant it took us longer to diagnose and identify the cause of the degraded performance.

We’ve substantially upgraded the resources available to this database server which has restored operations.

We will immediately investigate our monitoring system and put in place measures to prevent this type of failure from happening again in the future, as well as implementing a higher level of isolation, to prevent issues of this nature from manifesting as broadly.

To those affected, we apologise for this significant window of downtime/degraded performance, and we’ll be contacting you directly with followup measures and apologies.

Thank you for your patience,

Posted Oct 21, 2019 - 13:00 AEDT

Resolved
All systems operational. Stay tuned for a postmortem, running through specifics of the recent outage.
Posted Oct 21, 2019 - 12:44 AEDT
Monitoring
Servers appear to be running smoothly again now, we'll continue monitoring more closely for the next hour.
Posted Oct 21, 2019 - 11:38 AEDT
Identified
We're performing an emergency database upgrade now. There will be complete server downtime for affected customers for several minutes. We'll be back soon.
Posted Oct 21, 2019 - 11:32 AEDT
Update
We are continuing to investigate this issue.
Posted Oct 21, 2019 - 11:30 AEDT
Update
We believe the cause of this morning's outages is related to our database infrastructure, possibly linked to a routine maintenance upgrade that occurred over the weekend. We've applied some fixes, and are seeing some improved performance, but we're continuing to investigate.
Posted Oct 21, 2019 - 10:49 AEDT
Update
We are continuing to investigate this issue.
Posted Oct 21, 2019 - 09:52 AEDT
Investigating
We're experiencing some shaky server stability this morning. Looking into it as a matter of urgency. Will keep you posted!
Posted Oct 21, 2019 - 09:52 AEDT
This incident affected: Uptick Web, Uptick Mobile App, and Uptick Customer Portal.