You are here
›Research Topics›Data & Knowledge›Blog›Proof of Concept about Stream Analytics for Churn Prediction for OperatorsData & Knowledge
Proof of Concept about Stream Analytics for Churn Prediction for Operators
Churning (or customers leaving a company for the competition) is one of the most pressing problems faced by mobile phone operators, for its direct impact on revenue and market share.
Churn prediction systems attempt to identify subscribers with a sufficiently high probability of leaving in the near future so that, together with their predicted business value, a decision about which actions (if any) to retain them are justified. This problem can be addressed, among other ways with well-established machine learning techniques: using data from the past (such as activity of users in the past and whether and when they later churned), machine learning methods “learn” so-called predictive models which may then be used to assign a churning probability to any given user in the future.
However, the telecommunication market is highly and unpredictably dynamic. Subscribers are able to choose among multiple service providers and actively exercise their rights of switching from one service provider to another. The ability for immediately detecting such a trend on even a small segment of subscribers may be essential to retaining many more leaving in the near future. However, the reasons why subscribers churn may change suddenly. For many (and hard to define a priori) reasons, subscribers that were not considering leaving their company may suddenly consider another company very compelling and decide to leave overnight.
Ericsson Research, in collaboration with UPC Barcelona Tech. led by Prof. Ricard Gavaldà, has developed a Proof of Concept (PoC) about how use Stream Analytics technology for Churn Prediction for telecom operators. The PoC aims at building a real-time, fully adaptive solution for churn detection. It performs continuous and online construction of the predictive models of churning instead of building the models offline and applying them online. Stream mining enables almost real-time identification and adaptation to emerging subscriber-behavior patterns. Compared to other solutions where such patterns are revised offline or periodically, it has the potential to avoid a large number of subscribers leaving as a sudden and unpredictable reaction to e.g. competitor moves, publicity campaigns, billing changes, etc.
Architecture of the proposed platform
The generic architecture for churn prediction based on Stream Analytics technology is presented in Figure 1. The main components are described next:

Figure 1: Architecture and Design of a Platform for Adaptive, Real-time Churn Prediction using Stream Mining
- The system processes as input a number of streams with various simulated information. For example, a stream of call records, a stream with billing actions by the company and bill payments by the subscribers, etc. All these streams about a particular subscriber are integrated, generating a logically unique (what it is called here) Event. There are events of a number of types, which will include at least a subscriber joining the company, calls and SMSes, complaints, bills emitted and bills paid, tweets by a subscriber, and “churn” events (e.g., a user explicitly has left the company, or for a prepaid user, it has been declared as a churner according to the company’s criterion).
- The customer information database contains basic information about our subscribers (age, address, type of contract, etc.) as well as highly dynamic information (e.g. last numbers called, most frequently numbers called).
- The Record Generator receives the stream of Events generated by the integrator and uses it for two purposes. First, it updates the customer information database according to each Event. Second, it generates one or more Records from each Event, also using information from the customer information database. Thus, it creates a stream of Records passed downstream. A Record is a vector of features, the first of which is a subscriber identifier, and the rest contain all the information about that subscriber state that is considered relevant for prediction. For instance “number of calls this week”, “average call duration”, or “the subscriber defaulted and returned 2 times in the past” may be such features.
- The Record Processor is the heart of the system: it builds, maintains, and applies the prediction models. When a record not indicating churn arrives, it passes the record through the current model and makes a churn prediction for it. The record with its prediction is queued into a PendingPredictions queue. When a record indicating churn arrives, records for that subscriber are searched in the PendingPredictions queue and, if found, passed to the model builder as positive instances of churning. Expired records in PendingPredictions (corresponding to subscribers that did not churn within a specified time) are passed to the model builder as negative instances of churn. All records (describing current states of subscribers) are passed to the clustering method to build subscriber profiles.
- The predicted churners and their current profiles are also passed to the user interface or other parts of the customer management system so that adequate actions can be assessed and taken.
You will be asked to open these slides in the video:
The following video explains all the details and shows how to run the demo.
What the demo shows is how Stream Analytics technology is able to recover the accuracy of the churn prediction model by reacting immediately to changes that may occur in the distribution of the input data (e.g., changes in competition, deficiencies in customer service, etc.). This happens because the predictor is learning from recent experience, and changing the prediction rules to make them accurate again in the new situation.
-- David Manzano Macho, Ericsson Research
-- Ricard Gavaldà, UPC Barcelona Tech.



Comments
> This happens because the predictor is learning from recent experience, and changing the prediction rules to make them accurate again in the new situation.
How you guarantee recent experience is the correct one to follow/learn?
Machine learning approaches have their inherent defects in terms of accuracy (especially when the data is not clean).
Great demo. I'd like to know which algorithms you're using.