September 2011

High-Performance Cloud Infrastructure

Building High-Performance Cloud Telecom Infrastructures

By: Darren Schreiber, 2600hz

It’s no secret that businesses are moving to the cloud. Services such as Google Apps and Microsoft SkyDrive are well known offerings from bigger players, but smaller folks are also getting in on the mix. As the Cloud becomes more popular with businesses, opportunities exist to capitalize on building your own service offering. If you’re a current telco or ISP providing telecommunications services, now is the time to invest in building your own service offering. How do you get started?

The problem most existing service providers face in moving to cloud-based services is two-fold – understanding what the cloud really is and investing in the right technologies to provide cloud services. In this article we’ll explain what cloud telecom entails and dive into selecting the right technologies for offering cloud telecommunications services.

What is the Cloud?
Today, the term cloud is used loosely when describing a hosted or Internet based service. The word “cloud” is really a metaphor for the Internet. That said, users who refer to a cloud based service in today’s computing environment generally have pre-conceived expectations regarding pricing, reliability and general user experience. It’s important to understand your audience before examining available tools.

So, who are you targeting? You may have multiple audiences — from basic users to advanced users to developers and systems integrators. One common user group includes customers looking to save money and offload maintenance and operational headaches to someone else. These customers are generally looking for a “hosted” cloud service. Another common audience includes developers who are looking to create API-based “mash-ups” which combine two or more providers into a unique or niche product offering, often combined with a simplified user interface. These developers are generally looking for an “API-enabled” cloud service. The differences between these two types of offerings may be subtle at first glance but are quite massive in regards to design decisions and infrastructure requirements under the hood.

Audiences have differences in feature requirements but from reliability expectations the audiences generally expect the same thing – reliability and uptime. Cloud service providers face a unique challenge from customers who assume that the provider has anticipated and can react to every possible operational incident that occurs. The reality is that if the client can’t see the failure with their own eyes, they often don’t understand (or care) why it happened. A good example is a power loss. In the event of a power-loss at someone’s office, people expect their computers, phones and other devices to be offline and, despite their complaining, generally write off the time or find other things to do. But a power loss in a data center outside the range of a customer is seen by those same folks as a completely unacceptable and preventable event. Clients impacted by a datacenter power outage may often request to terminate their contract after just one incident. In this example, the client’s expectations are clearly much higher for the cloud service to be resilient to any sort of failure than if they had hosted the operation themselves despite what may be a lower price tag for the same service. This puts additional pressure on service providers to build resilient, high–performance architectures right out of the gate, regardless of expected capacities or investment costs.

What customers won’t usually ask for in a cloud solution but absolutely demand is the same reliability they get from having a solution on-site. This provides a series of responsibilities on the cloud provider that are often overlooked, specifically in telecom. For any hosted solution to truly be successful, it is critical that:

  • Bandwidth to/from the office is guaranteed (usually via QoS enabled circuits and bandwidth utilization monitoring)
  • Circuits (and potentially individual devices) are monitored for uptime and low latency
  • Automatic failover of both power and network are available at any time
  • Reliable Data Storage
  • 24/7/365 Server Access and Availability

The above requirements are weakest-link scenarios, meaning that achieving four out of five of the requirements isn’t good enough. The client will usually perceive the one item that doesn’t work well as a failure of the entire system, as they’ll be unable to differentiate between the items at issue (or will only care about the end result).

Building Your Infrastructure
So now that you’ve thought about your audience a bit and you know that reliability is a top concern, how do you decide what to build? Let’s take a look at two architectural scenarios that exist today and what they offer (and don’t offer).

Scenario 1 – Virtualize Existing Products
One of the easiest and fastest ways to replicate technology a client is familiar with while providing the benefits of the cloud is to simply take copies of existing software and virtualize it. Virtualization tools allow you to deal with hardware failures, migrations, rollbacks and other common operational issues and can provide some automated redundancy, ensuring the customer’s experience is akin to having the server in their own datacenter (or better). In this manner, each client can maintain their own copy of the software. Fault tolerance can be achieved by moving virtualized containers around in case of failure. Storage can be unified on a SAN or other large storage device and backed up collectively across all clients.

This is perhaps the most common and most popular form of hosting today, but it has a variety of drawbacks that creep in over time. The first and most obvious one is the responsibility to maintain individual copies of software for every client. If these copies get too far out of date, the client may get upset as they will be missing features or security patches and there may be no easy upgrade path. There is usually no trivial way to upgrade all copies of the software across all clients at the same time. This can create complex procedures and upgrade policies that often must be done at off-hours to avoid major service outages. The expenses for these types of operations tend to mount over time and must be rolled into operational costs for the service provider and ultimately passed on to clients.

The most common telecom software that is virtualized today as a standalone server for a single customer is Microsoft’s OCS and the open-source Asterisk product. Both products can be difficult to manage at scale with many customers but ultimately can provide powerful communications services to clients.

This model tends to have its main drawback in moving operational costs for servers that were really designed to be single-site installs onto the shoulders of a hosting team’s operational staff. This usually doesn’t scale once you’ve hit hundreds or thousands of clients. The lack of scalability and effective resource utilization (i.e. Having storage or servers which are wasted) in this approach is something that distributed computing environments attempt to resolve architecturally.

Scenario 2 – Distributed Software Products
Distributed computing is not a new idea, but doing it via commodity, virtualized hardware over the Internet is fairly recent. In telecommunications, virtualization has been seen as a difficult to implement solution to the accuracy and quality requirements of carrying audio. But as virtualization gets better at dealing with timing and clock cycles (which are used to ensure quality audio), it is becoming more common and acceptable to virtualize telecom systems.

Newer telecommunications software aims to address operational headaches for providers by assuming all clients will share the same set of resources and that maintenance operations are the responsibility of the software itself. In many cases, redundancy and upgrades can be achieved in place while the software is running. Workloads can tolerate spikes and security and other protections exist across the entire architecture and be upgraded or modified as such. This has the benefit of keeping all users on the same feature set and platform, simplifying support and operational tasks. Usually a cost advantage is gained as well, as each customer does not require their own set of hardware and signing up new clients can be done instantaneously. The disadvantage, however, can be widespread outages or a lack of customization abilities since hiccups or changes can affect all clients attached to the same architecture.

In addition, the Internet is allowing telecommunications companies to distribute their systems across multiple datacenters, allowing for high protection against failures. This strategy also allows for ease in scalability – when you run out of resources, you just “throw up another box.” These problems are hard to solve on older software where disk and memory access are assumed to be fast, shared and plentiful. In a distributed environment, computers in one datacenter must be able to operate without knowledge of what is happening in an alternate datacenter, and must be smart enough to automatically request data or circuits from multiple datacenters in the event of a failure while distributing storage and workload across the Internet.

In telecom, truly distributed software is still somewhat limited. Companies like Voxeo offer hosted services that can be resold, but to truly manage your own service you’ll need to invest in products from companies like PacWest with their Telastic product or 2600hz with their Whistle platform. These products are designed to break telecommunications software into tiny parts that work together but can be distributed over hardware that is located across the Internet.

Thanks to services from companies like Google, which are highly distributed and rarely go down, users have come to expect distributed software and services as the norm on the Internet. Even a thirty minute outage can have repercussions across social networks like Twitter that damage a company’s reputation. By investing in distributed VoIP products, telecom companies can gain a strategic advantage in distributing workload and preventing failures by utilizing the power of the global Internet.

Open Standards and Open Source Telecommunications Products
Once seen as hobbyist toys or unreliable systems, open-source products and open-standards applications now (quietly and secretly) power many production applications – from consumer electronics purchased at Best Buy to carrier platforms responsible for switching and routing millions of calls per day. Even the most recent version of the Android Mobile operating system quietly introduced a SIP stack into major providers’ devices, and carriers have announced all-IP networks due by 2013.

The major players in the open-source field aim to replace components that once cost thousands or millions of dollars and served specific purposes in telecommunications infrastructures. They include:

  • OpenSIPs / Kamailio – A powerful SBC and/or load balancer that sits at the front of your network to handle VoIP calls
  • FreeSWITCH – A powerful media soft-switch for handling millions of media calls per day on commodity hardware
  • WebRTC – A powerful real time communication product for enabling web video and media communications
  • 2600hz – A collection of open-source products to serve as platforms for PBX, Voice and Video services

There are many other products not listed here, but the field is booming and is finally reaching feature parity and reliability of carrier grade switches.

The main thing to note about open-source products is less about their reliability and more about their availability. Rather then needing to invest in a million dollar soft-switch just to get your product offering started, these open-source projects allow a service provider to get started with just a few virtualized computers running on the Internet. This leaves companies to either customize their products further or focus more on building a client base prior to investing in expensive resources. The catch is, of course, that your clients still expect 100% uptime and reliability, which you can’t skimp on just because you’re using open-source products or a limited footprint of hardware. Careful planning is still required to ensure that, no matter what the incident, you’re prepared for the unexpected.

Regardless of the path you choose, now is the time to get in on hosted, virtualized and distributed telecommunications systems. The network is open and clients are ready to experiment with these core services – generally unwilling to leave once they find a product they like for many years.

About 2600hz
Headquartered in San Francisco, California, 2600hz specializes in open-source communications software and cloud computing telephony services. The company is turning 100+ years of hardware-based communications technology into a scalable, open-source VoIP platform that supports hosted PBX services, conferencing environments, call-center environments and open API mash-up technology. With fluency in hosting and managed services, carrier-grade termination, software design and professional services, the 2600hz team is manufacturing the next-generation of telecom: a platform that supports multiple environments and programming languages. For more information, visit