Network Issue

Incident Report for Think Cloud

Postmortem

RFO Report for Network Service Incident

Summary

At approximately 17.00 on Tuesday 9th August 2022 our monitoring alerted us to a network affecting issue resulting in a loss of all connectivity to our network.

Upon investigation we identified that we had lost communication with both core switches resulting in a network wide outage.

Our network operations team attempted to hard reboot the switches to resume service however their attempts were unsuccessful.

At 17.40 our field engineer arrived at rack and rebooted the affected devices. This action resumed our network operation and traffic levels started to recover.

At 18.15 we identified that our traffic levels had recovered to only 30% of what they were prior to the incident. Deeper investigation showed this was potentially caused by DNS caching on our internal resolvers. Flushing the DNS cache resolved the issue and traffic levels resumed to normal levels.

At 18.50 our network engineering team performed previously scheduled emergency maintenance to replace a routing-engine on our core router.

Investigation and Root Cause Analysis

Further investigation and consultation with vendors technical assistance centre we have identified that this issue was caused by a manufacturer known firmware issue due in our current switch configuration.

Posted Aug 16, 2022 - 11:45 BST

Resolved

This incident has been resolved.

Posted Aug 10, 2022 - 11:47 BST

Monitoring

Our engineers have isolated the issue and traffic levels appear to be stabilising.
Our network team will continue to monitor the network.

If you have any issues ,please contact the helpdesk.
support@cloudspaceuk.co.uk

Posted Aug 09, 2022 - 19:13 BST

Update

Our engineers are still working on implementing a fix.
There may some instability while this is happening

We will provide another update in 15 minutes.

Posted Aug 09, 2022 - 18:53 BST

Update

Our engineers are still working on implementing a fix.
There may some instability while this is happening

We will provide another update in 15 minutes.

Posted Aug 09, 2022 - 18:36 BST

Update

Our engineers are still working on a fix.
There may some instability while this happens.

We will provide another update in 15 minutes.

Posted Aug 09, 2022 - 18:21 BST

Identified

The issue has been identified and a fix is being implemented.

Posted Aug 09, 2022 - 18:05 BST

Update

Our engineers have located the issue and are working on a fix.

Services should start coming back online shortly.

We will update in another 15 minutes or sooner.

Posted Aug 09, 2022 - 18:05 BST

Update

We are continuing investigating the issue.

We will provide another update in 15 minutes.

Posted Aug 09, 2022 - 17:44 BST

Update

We are continuing to investigate the issue.
We will be providing updates every 15 minutes.

Posted Aug 09, 2022 - 17:27 BST

Investigating

We are aware of a network outage currently affecting our services.

We have engineers working on the situation and will update this page as we know more information

Posted Aug 09, 2022 - 17:18 BST

This incident affected: Core Network, Cloud Network, and Dedicated Network.