Scaling Up ARI

In my recent post I’ve shared my excitement about the new Asterisk API called ARI. Today I am going to show an example of a real life application of this technology when being used as a foundation for an integrated call center software.

Requirements

Existing setup spans across two floors, each one housing 150 agents. The office is currently expanding and will cover 9 floors with 2000 stations operating on a 2 shift schedule.

There are inboud and outbound departments, QAs with live wireless access to ongoing calls as well as call recordings. Calls are often forwarded between departments, but could also be addressed to a specific extension.

The dialer is tightly integrated into a product management platform, which in this case is a financial backoffice for a consumer credit facility. There are approximately 150 TVs displaying live stats which need to be updated as frequently as possible.

Security of ARI

Due to close proximity of the banking platform and the need to maximize staff performance by reducing data handling overhead to an absolute minimum, ARI based service has been implemented as a bridge between Asterisk clusters and the fintech systems.

ARI Security

ARI’s access to CMS database has been restricted to make sure no financial information is exposed. It is also worth noticing, that it is the ARI application connecting to PBX servers and not the other way round. This allows for a complete firewall isolation of VoIP endpoints from the in-house infrastructure.

Enviromental Factors

It would be vague to discuss ARI scalability without looking at the big picture, which in this case includes all of the systems it interacts with.

One one side, there are data sources. Since we’re already using a high-level API, it’s safe to assume most implementations (like ours) will benefit from ORMs available with their frameworks.

In our scenario, ARI is going to interact with an SQL database to read and write persistent information and with a Redis backend for volatile data (mostly related to agents availability).

SQL scalability is a big subject, and I presented my approach to it in a separate article on risk mitigation networks. For our use case (up to 2000 agents in a single deployment), Redis performance is considered a non-issue.

ARI Environment

On the other side, there are Asterisk servers. We have designed and implemented a diskless distribution, where scaling up PBX is as easy as adding a new server and booting it via PXE. The deployment engine boots a stripped-down linux using a network image, configures Asterisk on the fly and then re-assigns agents to ensure equal distribution across the whole cluster.

To support multiple SIP servers, we have Opensips dispatching inbound calls across designated nodes. And for a truly integrated environment, all Asterisks need to share their extensions/AORs database by talking to the same SQL backoffice. It is crucial to closely monitor the performance and LOCKs of this database, as its well being is a must to ensure smooth operation.

ARI Under the Hood

Our ARI service has been developed using the flexible classes structure, which I described in detail in another post. This makes it easier to adjust the actual hardware resource allocation and maximize your ROI.

Also, make sure you do not transcode.

What Scales

If we remove the limit of 2000 agents per a single deployment (in a single shift), it is quite easy to identify the factors we would have to address to sustain the performance of our operation.

There are two SQL databases in play: our business data storage and the shared PBX backoffice.

The first is much easier to control. From our experience, ARI introduces around 15% extra load on the SQL cluster (assuming similar level of complexity between ARI and the business layer). You should be able to spot performance issues long before they affect the call flow. In addition, if you implement ARI using any of the high-level SDKs, chances are you will talk to your database using a mature ORM with all its perks, like optimized queries, no unneccessary locks and improved stability.

On the other side, with the PBX database you’re on your own. There are no abstraction layers to benefit from and even the smallest drop in performance will immediately affect the calls. From my experience, it’s much better to use a dedicated cluster for this particular database:

  • your CMS systems won’t act like noisy neighbors during peak hours,
  • you can fine tune its performance without affecting any other parts of your setup,
  • it can always reside as close to your actual PBX cluster as possible, there will never be a need to replicate it to another location.

A friendly warning: should you decide to use pgbouncer, pay special attention to configuration – if you’re careless with buffering, you’re going to have a bad time.

With databases and Asterisks covered, all that’s left is the ARI service itself. At this point, the task is trivial: as the number of PBX servers increases, so does the number of ARI daemons, proportionally.

To benefit from this structure, your CMS must mirror the Agent-to-PBX assignments when sending instructions to ARI. For a given agent, their web interface should only ever interact with the ARI daemon dedicated to his Asterisk server.

Multi Tenants & Multi Locations

With this design we managed to increase the number of SIP servers and ARI daemons from 1 to over 25 and experienced no problems whatsoever. I am confident it could be easily scaled up to hundreds of thousands extensions used concurrently in a single-tenant / single-deployment setup before further optimization will be required. Any improvements will have to do with databases, as neither ARI nor Asterisk servers are directly affected by the volume growth.

Our solution has been designed for use in a call center. That said, it wouldn’t be too difficult to apply similar concepts to VoIP setups of different nature. In my R&D labs, we’ve put together a working model which scales itself up and down automatically (using AWS resources) depending on current volume of calls. For resindential service providers a similar approach could be used to further reduce their barrier of entry.

But what about other call-center setups, different from the one described in this post? Let’s have a look.

Multitenancy makes things easier, much easier. Each tenant gets their own set of business & PBX databases, smaller in size and therefore easier to manage. The best part is – ARI and Asterisk nodes do not need to be re-deployed or even restarted. All that’s needed is to point them to a different set of SQL backoffices. This can be done in seconds (kill the services, change DB configuration, start the services) and means that agent stations and server resources can be re-assigned easily without any service interruptions.

For a truly distributed application, it is recommended to host the VoIP servers in a remote dedicated location, which can then support multiple satellite offices of the same company.

Leave a Reply

Your email address will not be published. Required fields are marked *