With so many BlackBerry (News - Alert) users wondering what caused yesterday's network outage, and whether it will happen again, TMCnet took some time to speak with Zenprise, a provider of automated troubleshooting software, for a closer look.
 
Zenprise helps solve complex e-mail problems quickly with real-time, automated diagnostic and resolution tools.
 
What occurred to cause the BlackBerry outage yesterday?
Zenprise software runs diagnostic tests to isolate the root cause of BlackBerry mail delivery problems. These tests revealed that one of the IP addresses used by the RIM network was not accepting any connections, while the other IP address was accepting connections. The specific reasons why the IP address was not accepting connections is something that RIM would need to identify.
 
What was the impact of the disruption?
--BlackBerry subscribers may have been unable to send or receive messages. Subscribers may also have been unable to register their device, roam in another location, or use other services such as Internet browsing.

--BlackBerry Internet Service subscribers may have been unable to use the BlackBerry Internet Service web site or perform activities such as creating new accounts, accessing their Internet mailbox, integrating third-party email accounts, or viewing email attachments.

--Devices may not have received new service books.
 
BlackBerry Connect and BlackBerry-enabled devices that require a new PIN may have been unable to receive the PIN.

--BlackBerry Enterprise Servers may have been unable to connect to the BlackBerry Infrastructure.
 
--Wireless service providers and device resellers may have been unable to use BlackBerry administration web sites or perform activities such as creating subscriber accounts or provisioning services for subscribers.
 
How complex is it to isolate and fix problems with this type of environment?
It can be rather time consuming to isolate these types of problems specifically to the RIM network. IT often is “alerted” to this problem when end users call into support complaining that new mail is not arriving to their BlackBerry devices. The natural reaction is to check the company’s infrastructure to make sure that the BlackBerry server, the mail server, the network, etc. are all running correctly. That investigative process alone can take anywhere from 30 minutes, to several hours, depending on the size of the organization. The RIM network is generally highly available so it can be the last thing that the administrator will check.
 
What does Zenprise offer for BlackBerry environments?
Software applications are distributed and highly interdependent on other applications to successfully operate.  For BlackBerry, enterprise customers typically install the BlackBerry Enterprise Server software behind their firewall, and link it to the corporate email system, usually Microsoft (News - Alert) Exchange.  The BES constantly checks with Exchange, asking if new e-mails have been delivered to the users’ inboxes. If there is new e-mail, it’s handed over to the BES, which then goes through the corporate firewall to the RIM NOC (News - Alert), handing it over to BlackBerry servers there. Those systems connect with the appropriate cellular carrier network to deliver the e-mail to the recipient’s BlackBerry device.
 
Zenprise automatically monitors and troubleshoots system data to provide administrators with 3 things:
1. Which application caused the problem
2. What’s the cause of the problem
3. What to do to fix the problem
                                               
How was Zenprise able to so quickly respond to the outage and notify customers yesterday?
Our software monitors the health of the RIM network 24x7. The moment one of our connectivity tests failed to the RIM network, our software generated a notification to our customers alerting them that their users will be unable to send/ receive messages because the RIM network is down.
 
Has Zenprise made any conclusions about yesterday's outage?
Basically there are two paths (IP addresses) in North America to connect to the RIM network.  According to diagnostic tests run by Zenprise software, one IP address was refusing connections to come through causing enterprise users to be impacted.  A few points to note:
 
* Any users on the working IP address experienced little to no service interruption
* Organizations that reported intermittent email activity are the result of switching between the two IP addresses
 
Are there ways to safeguard from this in the future?
There is not much that IT can do to safeguard against a RIM outage. However, they can establish clear internal processes to be able to detect the outage early on and proactively notify users about the issue. Doing so allows them to avoid frustrated users calling IT complaining about BlackBerry problems.
 
 
See Also:
 
 
Stefania Viscusi is an established writer and avid reader. To see more of her articles, please visit Stefania Viscusi’s columnist page.
 


Back to Communications Solutions