Cloud services are experiencing performance issues
Incident Report for Think Cloud
Postmortem

Root Cause Analysis (RCA) Report
Date: October 22, 2023
To: Our Valued Clients
Subject: Service Outage Report and Root Cause Analysis

Dear Clients,

We would like to provide you with a detailed report on the recent service outage that occurred on October 17th, 2023, and the subsequent steps taken to resolve the issue. We understand the inconvenience this may have caused and are committed to transparency in addressing the situation.

Timeline of Events:

Tuesday, October 17th, 2023, 13:00 PM: An issue was initially diagnosed as a core switch problem within our cloud network.

Tuesday, October 17th, 2023, 16:30 PM: Our data centre engineers promptly replaced the core switch and reconfigured it; however, the problem persisted.

Tuesday, October 17th, 2023, 23:00 PM: Further investigation revealed issues within the cloud storage system, which our engineers began diagnosing.

Wednesday, October 18th, 2023, 03:00 AM: Our data centre engineers applied various fixes to the cloud storage, successfully restoring its functionality.

Wednesday, October 18th, 2023, 09:00 AM: Engineers began addressing configuration issues within the cloud network, including VLANs and port configurations.

Wednesday, October 18th, 2023, 16:00 PM: Configuration changes were tested and implemented on the network, resulting in a temporary service restoration.

Thursday, October 19th, 2023: The service was operational, albeit with occasional instability as our technicians continued to apply various stability fixes.

Root Cause Analysis:

The root cause of the outage was a combination of factors:

Core Switch Failure: The initial diagnosis of a core switch malfunction led to the replacement and reconfiguration of the switch. However, this did not resolve the issue.

Cloud Storage Issues: Subsequent investigation revealed problems within our cloud storage system, which were rectified by applying appropriate fixes.

Resolution and Preventative Measures:

To prevent a similar outage in the future, we have implemented the following measures:

Redundancy: We are working on implementing further redundancy in our core switches and cloud storage systems to minimise the impact of hardware failures.

Should you have any questions or concerns regarding this outage or our preventive measures, please do not hesitate to contact us. We value your business and trust in our services and remain dedicated to meeting your needs.

Sincerely,

Cloudspace

Posted Oct 22, 2023 - 20:43 BST

Resolved
This incident has been resolved.
Posted Oct 19, 2023 - 23:18 BST
Identified
We are aware that some of our cloud services are experiencing performance issues, and we are actively working to resolve them.

Our technical team has identified the root cause of the performance problems, and it is related to hardware within our cloud network infrastructure.

To address this issue, we are in the process of implementing new hardware and making necessary adjustments to restore optimal performance to our cloud services.

We are committed to resolving this matter as swiftly as possible to ensure the highest level of service quality.
Posted Oct 18, 2023 - 21:34 BST
This incident affected: Cloud Network.