Cloud & Data Center

Big Data Explosion - Overwhelming Burden or Competitive Asset?

By TMCnet Special Guest
Susan Davis
  |  July 23, 2012

This article originally appeared in the July/August issue of INTERNET TELEPHONY

Today’s communications service providers are no strangers to big data. The explosive growth of mobile technologies, applications and related services over the last decade has prompted an equally explosive increase in the amount of telecommunications data being generated by billions of users around the world. Call detail records; network logs; as well as device, location and session information are quickly adding to the terabytes, petabytes and soon-to-be exabytes of data that CSPs now need to capture, track and analyze on behalf of their customers. In fact, Cisco’s (News - Alert) recently announced forecast for global mobile data growth predicts an 18-fold increase in traffic over the next five years, reaching 10.8 exabytes per month by 2016.

Managed efficiently, this type of machine-generated data can offer a treasure trove of intelligence that businesses can use to gain insights into things like subscriber behavior and customer churn, and to improve billing accuracy and service quality. But if your organization can’t get a handle on the volume, dealing with telecommunications data can instead prove to be an expensive, resource-draining burden. A smart data management approach, especially when it comes to database selection, can have a big impact on the ability of a CSP to compete and thrive in the fast-paced and continually evolving telecommunications market. Here are several key considerations and trends worth following.

Row is not the only way to go. For a long time, the row-based database has been the standard approach to organizing data. Common examples include solutions from Oracle (News - Alert), MS SQL Server, DB2 and MySQL. When it comes to transactional requirements – billing and invoicing for example, or inventory management – these databases are a good fit. But they run into trouble when it comes to high-volume, high-speed analytics, especially when intelligence demands are dynamic and unpredictable. Why? Well, it’s not what they were designed for. The nature of row-oriented databases means that all the columns associated with each row of data that’s being analyzed need to be captured to run a specific query.

Say, for example, you want to find out which subscribers on the friends and family plan are Android (News - Alert) users? Using a row database, you’d have to capture every column of data associated with each subscriber to run the query, even if many (or probably most) of the columns aren’t relevant. In the case of the above example, this means that in addition to capturing data on the type of plan, device and subscriber identifiers like name and location, you’d also probably need to load data on the number of calls made by the subscriber, the number of calls received, the average length of call, data usage, apps purchased, billing information, etc. If you only have a few extra columns of data, this may not be a big problem. But what if you have hundreds? Multiply those 100-plus columns by millions of rows, and disk I/O becomes a substantial limiting factor.

All too often, organizations either throw money (and hardware) at this challenge through expanded disk storage subsystems or more servers, or they throw human resources (i.e., database administrators) at it by creating indices or partitioning data to optimize queries. But when the questions stakeholders are asking of the data are constantly changing (and time-sensitive), manual data configuration is simply not practical. This means that while row is fine for static, pre-defined reports and transactional environments, analytic data challenges require different, more flexible technologies.  

Within a column-oriented database environment, the overall use case fundamentally shifts from transactional to analytic. As the name implies, columnar databases store data column-by-column rather than row-by-row, enabling the delivery of faster query responses against large amounts of data. This is especially important for CSPs that are being bombarded by rapid streams of data, from multiple sources, that their customers need quickly to ensure optimal service or to laser-target marketing efforts.

Most analytic queries only involve a subset of the columns in a table, so a columnar database has to retrieve much less data to answer a query than a row database. This simple pivot in perspective – looking down rather than across – has profound implications for analytic speed and efficiency. Most columnar databases also provide data compression, so in addition to improved query performance, they also require less storage hardware, which is important as data infrastructure becomes increasingly costly to scale, house power and maintain. Depending on the solution chosen and the mix of capabilities required, there are technologies out there that can achieve data compression of 20:1 or more. In particular, technologies that use knowledge about the data itself to intelligently isolate relevant information are especially worth looking into, as, combined with column-orientation, they can significantly accelerate analytic performance without requiring database administrators to create and maintain indexes, partition data, or build cubes or projections. Infobright’s solution  is based on columnar technologies, but we are not the only one. HP, ParAccel, and SAP (News - Alert)/Sybase IQ also offer column-based analytic databases.

There are also a growing number of distributed approaches to querying large volumes of data, such as the Hadoop frameworks with MapReduce, which can be used independently of, or in conjunction with, columnar technologies. The net-net is that there are a lot of options now available (including open source solutions that are either free or available at a fraction of the cost of traditional tools) that can help CSPs analyze more data, much faster, with far fewer resources and less infrastructure required.

There’s a reason that businesses use purpose-built technologies to address specific information challenges. Trying to force a solution to work for a problem it wasn’t originally intended for can end up being a big time, money and resource drain in the long run. When it comes to telecommunication analytics, your database needs to be fast, flexible and both cost and resource efficient, especially if your company is serving the needs of multiple intelligence consumers. 

Consider the example of Bango, a mobile billing and analytic service provider for mobile carriers and content providers. Bango (News - Alert) offers data collection, campaign tracking, page tracking and other services so that its customers can better understand subscriber behavior, optimize mobile marketing campaigns and drive higher advertising rates. More than 1 billion user IDs and over 100 million authenticated mobile subscribers worldwide are identified by the Bango system, and as the company’s business has grown, so has the amount of machine-generated data that needs to be stored, loaded and made available to its customers for analysis. In addition to online data such as page views and clicks, Bango tracks mobile-specific information such as device model, manufacturer, and user identity data. After signing one of its largest customers, the company sought a solution that could help it quickly and cost effectively scale its business while accommodating demand for fast, up-to-the-minute data analysis.

Bango’s existing row-oriented database required that indexes be custom-tuned to specific queries to deliver fast performance. This more than doubled the size of the raw data, which in turn required more storage. My company, Infobright, worked with Bango to address both its data overload and query performance challenges.

Infobright’s analytic database combines column-orientation with data compression capabilities, giving Bango the ability to quickly load massive amounts of mobile data and enable complex, ad-hoc queries in seconds without indexes, manual configuration or complex administration. By taking this approach, the company can support the analytic needs of its customers, both large and small, as the data keeps rolling in. For example, Bango was able to run a particular report in just 22 seconds vs. the five minutes the same analytic query used to take using MS SQL Server. Even more valuable, Bango is now able to analyze data volumes that its previous database wasn’t able to query against at all, opening the door to insights that previously were impossible to obtain.

The challenges that Bango has overcome to provide its client base with timely, actionable intelligence in the face of the big data explosion parallel those faced by a number of CSPs. Like Bango, these businesses will increasingly turn to database solutions that are designed to handle the volume, speed and variability that epitomize information analysis in the telecommunications environment. 

Susan Davis is vice president of marketing at Infobright.

Edited by Stefania Viscusi