Connectivity Issues
Incident Report for 3116 Digital
Postmortem

On October 22nd, 2020, at approximately 6 pm EDT, our Atlanta data center experienced a service degradation. As stated on the status page, the initial cause was reported as a fiber cut. Our Atlanta data center has two diverse dark fiber spans (referred to here as the A and the B side), providing connectivity to our transit providers, public and private peering points in an active/active model.  This model is implemented in many of our data centers across the world. Fiber cuts are fairly common, and because of this, each span has more than enough capacity to sustain the data center in the event of a cut or maintenance. In nearly every instance, this failure is instant and seamless. Unfortunately, in this case, it was not. 

3116 Digital is in a transition to an upgraded data center. As such, some of our internet connectivity has already been moved. When the A side fiber went down, traffic over the recently moved circuits did not failover as expected, causing some blackholing of customer traffic. The NetOps team quickly found this and turned down those connections, putting the entire A-side of the network hard down. This put all data center connectivity solely on the B side, restoring all services. NetOps engaged 3116 Digital’s fiber provider to investigate and repair the outage. The provider reported a fiber cut in the area and attached our ticket to that issue. The fiber cut took many hours to repair, not completing until late the next morning. 

10-23-2020 12:45 pm Full data center outage 

When our fiber provider called the cut fixed, we were still down. We continued to work with them to find the cause of the down fiber. It was determined by the fiber provider that they had added us to the larger fiber cut in error, and we were not affected by that outage. At approximately 12:45 PM EDT, we lost all connectivity to the Atlanta data center. While investigating why the A-side was down, the local field technician mistakenly disconnected our B side cutting off the data center completely.  The field technician quickly reconnected the fiber, restoring connectivity to the data center, and resumed the effort to restore the A-side. Shortly after finding the cause of the A-side being down, our fiber pair had been disconnected from the provider’s panel. Reconnecting this restored connectivity to our A-side. The accidental disconnection of the B side was due to an incorrect label on the provider’s panel. The fiber provider is investigating the cause of the A-side being disconnected as it was not requested by 3116. Concerning the service degradation on the A-side, the cause has been determined to be BGP not failing as expected in the A-side being down, but the new transit already transitioned to the upgraded data center. This has already been corrected, and in the event of another fiber cut or outage, failover will happen as expected.

Posted Nov 09, 2020 - 11:01 CST

Resolved
We haven’t observed any additional issues in the Atlanta data center and will now consider this incident resolved. If you continue to experience problems, please open a Support ticket for assistance.
Posted Oct 24, 2020 - 09:11 CDT
Monitoring
At this time we have been able to normalize connectivity in our Atlanta data center. Because this issue is related to an ongoing fiber cut with our upstream provider, the possibility of further connectivity issues remains. We will be monitoring this issue to ensure that connectivity remains stable, and we'll keep this incident updated and open until we've confirmed the fiber cut is fully repaired. If you are still experiencing issues, please open a support ticket for assistance.
Posted Oct 23, 2020 - 14:10 CDT
Update
Our team continues to work quickly to implement a fix, and we will provide an update as soon as the solution is in place. All site access has been restored but there could continue to be intermittent issues with access to those sites while we continue to work on this fix.
Posted Oct 23, 2020 - 13:33 CDT
Identified
Our team has identified the issue affecting connectivity in our Atlanta data center. We are working quickly to implement a fix, and we will provide an update as soon as the solution is in place.
Posted Oct 23, 2020 - 12:28 CDT
Investigating
Our team is investigating a connectivity issue in our Atlanta data center. During this time, users may experience connection timeouts and errors for all services deployed in this data center. We will share additional updates as we have more information
Posted Oct 23, 2020 - 12:09 CDT
Update
The cause of this outage was related to a fiber cut near our Atlanta data center. Connectivity has been restored, and we're continuing to monitor the situation to ensure there are no additional impacts to customer services. This status page will be updated once the fiber cut has been repaired
Posted Oct 23, 2020 - 10:14 CDT
Monitoring
At this time we have been able to correct the issues affecting connectivity in our Atlanta data center. We will be monitoring this to ensure that it remains stable. If you are still experiencing issues, please open a Support ticket for assistance.
Posted Oct 23, 2020 - 06:00 CDT
Identified
Our team has identified the issue affecting connectivity in our Atlanta data center. We are working quickly to implement a fix, and we will provide an update as soon as the solution is in place.
Posted Oct 22, 2020 - 15:30 CDT
Investigating
Our team is investigating a connectivity issue in our Atlanta data center. During this time, users may experience connection timeouts and errors for all services deployed in this data center. We will share additional updates as we have more information.
Posted Oct 22, 2020 - 10:13 CDT
This incident affected: DataCenters (US-Southeast (Atlanta)) and Backups (US-Southeast (Atlanta) Backups).