question

christopher-mcmullan avatar image
christopher-mcmullan asked ·

Question about structural improvements in the wake of Canadian toll-free outage on May 9th

There was an outage on May 9th affecting Canadian toll-free traffic into RingCentral's system from a large national telco.

Due to the way we configured our test of RingCentral's services (call forwarding all calls into our local DID to our main toll-free number, 844-825-5226), this affected all incoming calls to our organization. Anecdotally from my own testing during the outage, only 1 out of every 5 to 10 calls completed.

This only seemed to affect calls originating from the PSTN network, however: cell phones, landlines; even to RingCentral's own toll-free Canadian support number from PSTN sources. Calls originating with VoIP were unaffected. I could use Google Hangouts, my own private VoIP service, or a number of other alternatives to successfully call our company's number.

We lost a good deal of trust and goodwill, both internally towards RingCentral, and externally towards our organization, as a result of the outage. At least one donor threatened to cancel their support if they had any more problems getting through to us. Issues on this scale, lasting for as long as they did (18 hours), cannot happen again. 

If it were to come to an internal vote at this moment, I would need to cancel our RingCentral service because of the mistrust with which it is viewed by our staff, who are justifiably concerned about our reputation, and their ability to do their jobs. Part of the mistrust is borne from other technical issues we have experienced, which I am not nearly as concerned about; they can be addressed as tickets elsewhere.

My main concern in this post is to set proper expectations for myself and for Catholic Christian Outreach going forward, so I have a series of questions related to the outage which I hope you can help me address. Marie, a support representative I spoke with yesterday, suggested this forum as the best place to find the answers I am seeking.

1. How often have outages of this nature occurred in the past?
2. Is the issue solely related to the telco's equipment, or are any of the factors that led to the outage within reach of RingCentral's ability to fix?
3. How likely is it, theoretically, that we should expect similar outages in the future?
4. If it is an issue of missing redundancy, can the problem be resolved far enough upstream, so that, by working with a second telco, problems like this can be addressed by making the necessary structural changes to call routing to and from Canada into RingCentral's systems?

Specifically for the last question, I raise it because every other cloud-based VoIP vendor I spoke with, when they found out we had chosen RingCentral for further testing and a new contract, voiced major concerns about your ability to handle the amount of traffic, or the complexity of the setup that we were proposing. I don't mind mentioning the companies by name: Vonage, Versature, and NuEdge Communications (reseller of Avaya solutions out of Ottawa). 

Versature specifically differentiated themselves by stating they had redundant connections to telcos which, in my mind, would seem to avoid the problem that led to the outage in May that we suffered from.

I don't know enough of the technical details to say either way whether they are correct, so I am asking directly: is the issue leading to the outage of May 9th completely outside of your control, and common to all VoIP vendors based in the U.S. who would have to work with the one telco to receive toll-free traffic, or is it specific to RingCentral? Could it be solved through seeking a redundant contract with a second telco? If so, is this planned in the aftermath of the outage?

We will decide whether or not to renew our RingCentral service solely based on the answers I receive to my questions here, before ever dealing with the other technical issues we have encountered over two months of testing. Obviously, I am not asking for a level of detail that would compromise the security of your internal network, but rather some assurance through additional information that the problem is not impossible to solve.

I appreciate your assistance. Good alternate ways to reach me outside this post are by email at christopher.mcmullan@cco.ca, or by cell phone at 613-231-3408. My name is Chris McMullan, and I am the Systems Administrator for Catholic Christian Outreach.

  1. Account type
    Canada
  2. Related case number (if applicable)
    N/A
  3. Detailed description of problem
    Described above.
  4. Goal
    Further information about the outage on May 9th, incident ID INC-17370
  5. Previous troubleshooting steps taken
    Work during the outage to narrow down the source of affected calls (VoIP vs. PSTN), and two support calls to verify the incident ID and follow up with the questions I pose here.
  6. Software version
    N/A
  7. If desk-phone related, have you rebooted phone?
    N/A
topic-default
1 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

mike avatar image
mike answered ·
Hello  Christopher McMullan,

I'll look into this and see what information we can provide. 

Mike 
Share
1 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

christopher-mcmullan avatar image
christopher-mcmullan answered ·
Hello Mike,

Were you able to find out any more information?

Chris
4 comments Share
1 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks for your patience. We have a group of people working on this. Sorry I can't offer you specifics at this time but someone should be getting back to you. 

Mike 
0 Likes 0 · ·

Mike,

Thank you for your continued help relaying information as it comes available to you. Could you please find out an ETA for when I should expect someone will contact me about this, if even just to say there is nothing new to report, but they would call again in a certain number of days?

Chris

0 Likes 0 · ·
Yes.. I'll be happy to follow up again, as I know someone is working on this. 

Mike 
0 Likes 0 · ·
Mike,

I've been given an internal deadline to figure everything out by the end of business on Friday.

Thank you for whatever new information you can gather by then.

Chris
0 Likes 0 · ·
mike avatar image
mike answered ·
Hello Christopher,   

Sorry for the delay on this.    Please see comments from our management team below. 

On May 9, 2016, a portion of inbound toll-free calls were not completing for customers calling out of Canada. Impact duration was 18 hours and 8 minutes, starting on May 9th at 8:19 a.m. PST. The affected calls were not being routed to RingCentral from an underlying carriers network, there was no equipment failure within the RingCentral network, therefore this incident was not within RingCentral span of control. The impact was due to a combination of a fiber cut and failure of equipment with one of Ring Centrals underlying carriers. They were unable to route traffic to the RingCentral network. During the incident, RingCentral was in constant contact with this carrier, as well as troubleshooting any possible solution that could have been executed on our end. Due to the issue being located outside of our network, restoration of services for these customers was solely dependent on the underlying carrier repairing their fiber cut and replacing the faulty equipment.

 An incident of this nature is not unheard of in this industry, however, in most cases, the impact would not be comparable to the severity of this particular issue. The series of issues with the underlying carrier, the fiber cut and equipment failure, is unlikely to occur again. Unfortunately, it is not possible for multiple carriers to own/route the same telephone numbers at any given time, so the phone calls must be routed through a single carriers network first in order to reach our network. With proper redundancies in place, like whats in place in our RingCentral network, an incident of this duration and severity is a rare occurrence, and it is likely the carrier is working on a more effective redundancy plan, if this is a task that has not already been accomplished. 


 
Share
1 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Customer Spaces