ACCUVER | leading wireless test and measurement solutions

[White Papers] AI-powered RCA

Jun 05, 2024

Table of contents

1. Introduction

2. What is AI RCA?

3. Technology applied to model.

4. Learning and evaluation

5. XCAP-Cloud with AI Powered RCA

6. Use Cases

7. Future Directions

*Under R&D collaboration with Korean MNOs

Introduction

With the rapid development of wireless communication technology, ultra-high data transmission speeds and connections to various devices increase, the communication environment is becoming more diverse and complex. Accurate and rapid response to communication system failures resulting from this diversity and complexity is essential. To address these market demands, we provide automated wireless network optimization testing solutions as well as logic-based RCA solutions that identify the causes of various defects that occur in mobile communication networks and provide appropriate solutions.

Our logic-based RCA solution utilizes wireless network communication protocol transmission/reception information and terminal status information to accurately identify and resolve problems through structured data and rule-based analysis. However, in this advanced communication system, parameter settings for each analysis rule are complex, and it is difficult to consider characteristics of the field situation, limiting rapid response.

To overcome these limitations, we developed a machine learning-based RCA solution. By utilizing the latest machine learning technology to learn subtle differences in the network environment hidden in the vast amount of data collected from mobile communication networks, large-scale data can be quickly analyzed and diagnosed based on the data without relying on individual subjectivity. This is expected to contribute to improving the stability of communication systems.

What is AI Powered RCA?

Our solution is a machine learning-based RCA solution that utilizes our automated testing solution to perform root cause labeling using raw data obtained from network access failure and service interruption log samples. The training dataset contains the network's signal level and quality indicators, as well as network quality indicator metrics such as data throughput, latency, and packet loss rate for each layer.

Figure1. AI RCA concept diagram

This training dataset consists of approximately 1 million log data, including network issues that occurred in various environments during field testing. This data is collected through a variety of methods, including field testing, simulation testing, and laboratory testing, to reflect the complexity of network problems. To address the complexity of the problem presented by strong correlations between key indicators, our model minimizes similarities between data characteristics and learns each root cause individually to enable accurate classification and understanding.

Additionally, machine learning has the characteristic of being capable of continuous learning and improvement. This means our solutions can continuously optimize and improve models to respond to new challenges that arise during real-world operations. This offers great advantages in maintaining reliability in a rapidly changing communications environment.

Additionally, AI's automated decision-making capabilities help quickly process and diagnose large amounts of data. Our model supports efficient and accurate communication system problem solving while minimizing individual subjectivity through data-based judgment.

These technical advantages allow our solutions to leverage the powerful analytical power of machine learning to help improve network reliability and improve availability and performance.

Figure2. Examples of data set

Technology applied to AI Model

We adopted a gradient boosting ensemble model using XGBoost and developed a powerful tool for effective root cause diagnosis. XGBoost is applicable to both classification and regression problems and is particularly characterized by excellent performance for a variety of data sizes and types.

Figure3. XGBoost concept diagram

XGBoost features provide excellent performance for a variety of data sizes and types. This is very useful as we deal with large amounts of network issue data from a variety of environments. The model learns useful patterns from large amounts of data and can effectively identify root causes. Additionally, XGBoost uses parallel processing and optimized data structures to provide fast learning and prediction speeds. This is a big advantage of XGBoost's fast learning and prediction capabilities to quickly respond to various problems that occur in large-scale networks.

For root cause diagnosis, prioritizing each characteristic and creating each XGBoost model based on this increases the interpretability of the model and provides a clear understanding of the characteristics of each root cause. Through these feature priorities, the model learns the importance of features for each root cause and enables effective response to the complexity of the problem.

In this way, our model using XGBoost combines strong prediction ability with fast response speed to achieve efficient root cause diagnosis.

Leaning and evaluation process

Figure 4 shows the learning and evaluation process. First, the data is purified through data preprocessing. Duplicate data is removed, and outliers and missing data are replaced with the most prevalent value in each data set. Lastly, for data imbalance, down-sampling using Euclidean distance calculation and up-sampling using SMOTE are performed to balance the data.

In the Euclidean distance calculation method, the distance of points other than the root cause label is compared one-to-one based on the center point in the distribution of the root cause label. After comparison, points that are judged to be too far from the distribution of the root cause label are removed, leaving only the points that exist as close to the boundary as possible. The SMOTE method synthesizes adjacent minority class samples between majority class samples. This increases the number of samples of minority classes, helping the model learn better and recognize minority classes better.

Figure4. Training process and evaluation process

After performing this preprocessing process, an XGBoost model according to each priority is created and trained and verified. During verification, a grid search technique is used to find the most appropriate hyperparameters for the model. If you input the data set you want to test into the learned RCA solution, you can get results where the root cause label predicted by the model matches the actual root cause label.

The evaluation results are as follows: It was derived as the average accuracy value of each model.

	Total count	Success count	Fail count	Accuracy
5G-NR PS	27365	24243	3122	88.59%
5G-NR Voice	29903	25376	4527	84.86%
LTE PS	90505	82813	7692	91.50%
LTE Voice	29501	26958	2543	91.38%
Total	177274	159390	17884	89.91%

Figure5. AI model evaluation results

XCAP-Cloud with AI-powered RCA

AI-powered RCA is provided by XCAP-Cloud, a cloud-based mobile network analysis solution. We use test equipment to collect data generated from the telecommunications carrier's mobile communication network. Collected data is uploaded to the server in log form. Users can define rules to identify specific patterns in logs and send them to the AI model when those patterns are found. Logs must be interpreted before being fed into an AI model. Log interpretation involves understanding the contents of the log and extracting KPIs. Interpreted logs and extracted KPIs are sent to the AI model through grpc. grpc is a protocol for efficient and reliable data transmission. ARCA infers the root cause based on the received data.

Inference results can be checked using various visual tools provided by XCAP-Cloud. Visual tools help you intuitively understand inference results.

Figure6. XCAP-Cloud

The accuracy of AI prediction results extracted in real time from the XCAP-Cloud system equipped with an AI-powered RCA solution was observed to be 97%.

Figure7. Real-time AI prediction accuracy

Use Cases

All issues that may arise in the 5G/LTE environment are categorized and managed through RCA, and each event is labeled to help easily identify the root cause.

VoNR Call Setup Failure Case

Voice services using 5G RAN, 5G Core, and IMS are called Voice over New Radio-VoNR. NR UEs can perform voice services directly on the NR network without falling back to the LTE network. VoNR Call Setup Failure can occur for a variety of reasons. In the initial network construction stage, Cell Search failure, PDCCH Decoding error, IMS Registration failure, etc. are the main causes that can cause problems in which the terminal cannot connect to the network or register with the IMS server. This solution can quickly classify the problem by extracting the cause of cells that cause many setup failures.

VoNR Call Drop Case

As a mobile communication system, even if the initial setup is successful and the call is connected normally, a call drop may occur when entering the cell edge or when handover occurs due to RF deterioration or when the settings of the source cell and target cell cannot maintain connectivity. can. In addition, even though information about neighboring cells is searched periodically, when a call must be continued without finding a suitable cell, a large amount of RTP packets is lost, and the network reclaims radio link resources to cause a call drop. can. This solution can help you accurately analyze call drops. By additionally checking the packet data and Layer3 messages provided within XCAP-Cloud, you can quickly identify problems and take appropriate action.

Figure8. RCA Workflow

NR FTP Low Throughput Case

NR FTP Low Throughput Case

After the normal call setup process is performed in 5G NR, you may experience quality issues in data calls such as FTP and HTTP with processing speeds lower than expected.

Typically, degraded RF performance may indicate low throughput, while normal RF performance may indicate parameters related to throughput and capacity. It can be caused by various reasons such as UL/DL bandwidth, MCS, Layer, Rank Index, etc.

This solution can help with accurate analysis by extracting the cause of low RB allocation that occurs in the network and the resulting low throughput cases.

Future Directions

We aim to provide intuitive and versatile solutions to diagnose wireless network problems quickly and accurately.

A successful AI model must not only achieve high accuracy, precision, and recall, but also ensure reliable prediction performance at a level that actual customers can use as wireless network analysis indicators. To achieve this, AI models must learn the know-how of highly skilled wireless network analysis experts and continuously evolve.

Our goal of a wireless network analysis know-how training system is to build a customer-tailored AI model learning infrastructure to provide a system that allows customers to directly discover data sets and upgrade AI models. Through this, concerns about personal information and data leaks can be resolved, and AI models that meet customer needs can be built more effectively.

In addition, if only data from the wireless network connection section of the mobile and base station is used, the root cause of the failure due to problems with the upper layer access probe or core probe may be unclear. Therefore, there is a need to develop it into a comprehensive learning model from end to end. We will continue to take on this challenge without stopping.

Figure9. Customer-tailored AI model learning system

List