Amruta Agnihotri, Software Architect

WebRTC Media Servers are at the core of many online collaboration and conferencing platforms of the day. They are the multimedia middleware supporting video, voice, and generic data to be sent between peers, allowing developers to build powerful voice and video-communication applications. Kurento Media Server (KMS) is one such powerful Open Source media server supporting wide range of real-time communication features.

Online collaboration has become essential part of everyday life making quality of the collaboration platforms and applications a critical consideration. One important factor that measures good quality of such platforms is performance under load and peak usage. For smooth, un-interrupted audio/video sessions under heavy load, Kurento Media Server’s scalability plays an important role.


Scalability and load balancing for standard web applications backed by RESTful server endpoints is usually managed based on the following parameters:

Minimum and maximum instances to be scaled up and down


Average CPU utilization to consider in order to scale up or down


Average network in/out to consider in order to scale up or down


Average memory usage


Health of the instances

However, these parameters don’t provide adequate support for load balancing of media servers. The most important and challenging parameters missing in this list are the context and considerations of collaboration meetings. Standard, readily available load balancing approaches cannot take into consideration ongoing meetings data, geographies of users joining the meetings, health status of media service etc. This creates a need to implement custom mechanisms to take care of horizontal scalability of KMS.

A typical auto scaling KMS deployment would look like this, and would need custom mechanisms discussed below to address the challenges.



1. Number of meetings in progress

It may be tempting to think that CPU utilization of an instance is proportional to the number of sessions being hosted on it, but we need to be cautious. Relationship between number of meetings and resource utilization depends on the type and hardware configuration of instance as well as the features being used by the meetings in progress. As an example, users in a meeting who are actually publishing audio/video, greatly impact CPU utilization compared to the users who are just acting as subscribers. Threshold limits for scaling out need to be defined based on meticulous understanding of use cases and user behaviors.

Similarly, when the CPU utilization drops below a certain threshold, it does not automatically mean that the instance can be scaled in; as some meetings in progress may require additional resources at a later point.

2. Number of participants in each meeting

Number of participants in each session directly contributes towards total resource utilization of that instance. While deciding the thresholds, along with total number of sessions, number of participants in each session plays an important role. This parameter is challenging to handle as the system may not always know the numbers prior to allocating resources, and may need to cope with additional resource demands.

3. Geographic locations of meeting participants

Locations of participants may impact delays introduced in audio/video streams thereby impacting the quality of the session. Advanced scaling algorithms also consider this while allocating media servers and other resources to a session.

4. Health check of the media server

Along with the health of underlying hardware and software packages monitored by standard load balancing algorithms, it is possible to monitor health of the services offered by the media server. This helps in ensuring that the allocated servers are fully functional and deliver optimal performance.

5. Cost implications of scaling in and out decisions

Standard scale out policies can incur more cost since we may not always have control over how many new instances are spawned at a time. Custom algorithms can ensure better cost control as the decisions to scale in and scale out are aligned with the pricing policies of the underlying cloud service being used.

Along with in-depth knowledge of Kurento Media Server and the architectural alternatives offered by cloud services such as AWS, Azure and GCP, the developers also require rigorous load testing capabilities. Load testing is required to determine and validate various thresholds for parameters under consideration, to benchmark the numbers, and to verify that scale in and scale out algorithms are working well at the same time ensuring that media quality is not compromised while handling the load.