Data compression has always been an important technology in
computing, and its use has sped up the adoption of the fax machine, the
modem, digital music distribution, digital photography, video, and countless
other markets. You may have thought you heard it all when it comes to
compression. I certainly did. You can imagine my surprise when I came across
a company called Peribit Networks (www.peribit.com)
who claim to use compression algorithms based on human genome DNA research
to enable compression rates of an average of 70�75 percent of corporate data
speeding across networks. This technology is built into their SR series of
sequence reducers.
There are so many people in the industry that tell me that bandwidth is
essentially free and that there is no need for compression technologies in
such an environment. These people are generally wrong� Especially when you
consider the international market: Broadband access is still expensive; and
in some countries only a limited amount of bandwidth is even attainable.
Worse yet, even if bandwidth is available, it can take months to provision
new service in certain parts of the world! It is in these types of scenarios
where increased network compression can reduce costs and increase
convenience while simultaneously freeing up bandwidth for other applications
such as video and other types of telephony.
The general principle behind lossless data compression is as follows:
Repeating patterns are found in the data and these patterns are assigned a
symbol or sequence number. In the phrase �I used a tart as a model for the
art that I will start to sell on my cart� every �art� can be replaced by a
symbol. Decompression merely replaces the symbols with the appropriate data.
As you may have inferred, there is also something known as lossy
compression, which is generally used for audio and video and as you have
probably surmised it is a compression scheme where some data loss is
tolerated.
At a higher level but in a similar way, a concept called proxy caching
allows networks to reduce data transmission requirements. Most current proxy
servers allow proxy caching. A brief primer: if you are looking to connect
two subnets -- or organizations that are both connected to the Internet, you
generally use gateways at both locations to seamlessly connect them. One of
the functions performed by a gateway is network address translation or NAT.
In effect, NAT allows multiple �internal� IP addresses -- or IP addresses
not seen on the Internet to masquerade as the Internet IP address assigned
to the NAT device. This is the technology behind most broadband sharing
devices such as Linksys gateways used in homes and small offices.
A home user who uses a cable modem is assigned one IP address from the
cable company. A NAT device allows that one IP address to be shared by
multiple computers by assigning each computer an internal IP address not
visible from the Internet. Whenever an internal computer tries to access the
Internet, the NAT box �translates� packets in such a way that the packets
returning to the NAT box from the Internet are sent to the appropriate or
�requesting� computer. A proxy server generally functions as a gateway
providing network address translation with intelligence built in allowing
better packet management. One of the more popular proxy servers is known as
Squid (http://www.squid-cache.org/).
Proxy caching is a high level and scalable yet inefficient way of
compressing data because it is object-based, meaning that if object
attributes and names are the same, they do not need to be retransmitted. If
an object changes only slightly, however, it does need to be resent. It is
these types of issues that lead to proxy caching being an effective, albeit
less than ideal way to wring the most compression out of a network.
Peribit uses what they call MSR or Molecular Sequence Reduction technology
(derived once again from their background in DNA research), which they say
gives the best of both proxy caching and traditional compression techniques.
Their compression technology is also very scalable, allowing 45 megabits per
second to be processed at greater than 70 percent compression. This
bandwidth number is slated to increase going forward.
Logic tells you that the only way to wring more compression out of a network
is to look at large amounts of data and cache as much of the repetitive
information as you can. Every application has varying levels of potential
compression available to them. Peribit tells me that Web data can be
compressed by over 90 percent while e-mail can be around 75 percent, SQL
data around 80 percent, and image servers around 70 percent. Voice, as you
might imagine, is not very compressible -- 40 percent is the maximum
compression you will see. This is due to the fact that the headers can be
compressed but not the packets, as they rarely exhibit high levels of
redundancy.
Are you skeptical? Well, Sears had a reduction in data traffic of 92
percent. Certain U.S. military departments are using the technology in
countries in the Middle East that have less than ideal amounts of bandwidth
capacity available to them. Peribit tells me of a major pharmaceutical
company that is saving $800,000 a month in service provider costs because of
this technology! The best way to find out how MSR technology will work in
your organization is to contact the company directly at
www.peribit.com -- they can send you a
CD, which will contain a program that will monitor your network and tell you
what the reduction of network traffic will be if you use Peribits�s
products.
When you think about it, it becomes clear why there is more redundancy to be
wrung out of corporate networks. This is especially true when you realize
that CRM apps add up to 32k of bandwidth per user. Keep in mind that as
bandwidth levels increase rapidly, routers get stressed, leading to packet
loss, which just worsens the congestion problem. Adding more bandwidth takes
time to provision and worse yet can affect load balancing and possibly lead
to the need to purchase higher capacity routers and other equipment.
The SR series of devices are customized Intel boxes with two Ethernet
ports and are designed to pass-through data in the case of a power failure
so as not to interrupt data transfer. The entry cost is $2,900 for 128 Kbps
of bandwidth and prices are tiered by WAN speed. Version 2.0 of SRS will
have added functionality allowing applications to be given different
priority levels and also set apps to take up a maximum portion of the data
pipe.
Using a software product called Central Management system allows users to
monitor the inner workings of multiple sequence reducers allowing you to
view the compression by device and packet type. CMS can also be used for
remote upgrades.
Compression is a fascinating science -- always has been and always will be.
For years, I have heard people make wild claims about compression ratios
that seem unfathomable and are proven untrue. Peribit seems to be onto
something here. They have real and huge customers. It is intuitive to almost
any network manager that their networks contain high levels of redundancy.
If only a small part about what Peribit tells me is true this will be one of
the most successful tech companies of the decade. It should be an
interesting IPO if Cisco doesn�t buy them. Let me know what you think at
[email protected], and if you are
interested in the nitty-gritty on data compression check out
http://datacompression.info/ for
more information.
[ Return
To The April 2003 Table Of Contents ]
|