Table of contents
1. Introduction
2. What is AI RCA?
3. Technology
applied to model.
4. Learning
and evaluation
5. XCAP-Cloud
with AI Powered RCA
6. Use Cases
7. Future
Directions
*Under R&D collaboration
with Korean MNOs
Introduction
With the rapid development of
wireless communication technology, ultra-high data transmission speeds and
connections to various devices increase, the communication environment is
becoming more diverse and complex. Accurate and rapid response to communication
system failures resulting from this diversity and complexity is essential. To
address these market demands, we provide automated wireless network
optimization testing solutions as well as logic-based RCA solutions that
identify the causes of various defects that occur in mobile communication
networks and provide appropriate solutions.
Our logic-based RCA solution
utilizes wireless network communication protocol transmission/reception
information and terminal status information to accurately identify and resolve
problems through structured data and rule-based analysis. However, in this
advanced communication system, parameter settings for each analysis rule are
complex, and it is difficult to consider characteristics of the field
situation, limiting rapid response.
To overcome these limitations, we
developed a machine learning-based RCA solution. By utilizing the latest
machine learning technology to learn subtle differences in the network
environment hidden in the vast amount of data collected from mobile communication
networks, large-scale data can be quickly analyzed and diagnosed based on the
data without relying on individual subjectivity. This is expected to contribute
to improving the stability of communication systems.
What is AI Powered RCA?
Our solution is a machine learning-based RCA solution that utilizes our automated testing solution to perform root cause labeling using raw data obtained from network access failure and service interruption log samples. The training dataset contains the network's signal level and quality indicators, as well as network quality indicator metrics such as data throughput, latency, and packet loss rate for each layer.
Figure1. AI
RCA concept diagram
This training dataset consists
of approximately 1 million log data, including network issues that occurred in
various environments during field testing. This data is collected through a
variety of methods, including field testing, simulation testing, and laboratory
testing, to reflect the complexity of network problems. To address the
complexity of the problem presented by strong correlations between key
indicators, our model minimizes similarities between data characteristics and
learns each root cause individually to enable accurate classification and
understanding.
Additionally, machine learning
has the characteristic of being capable of continuous learning and improvement.
This means our solutions can continuously optimize and improve models to
respond to new challenges that arise during real-world operations. This offers
great advantages in maintaining reliability in a rapidly changing
communications environment.
Additionally, AI's automated
decision-making capabilities help quickly process and diagnose large amounts of
data. Our model supports efficient and accurate communication system problem
solving while minimizing individual subjectivity through data-based judgment.
These technical advantages allow our solutions to leverage the powerful analytical power of machine learning to help improve network reliability and improve availability and performance.
Figure2.
Examples of data set
Technology applied to AI
Model
We adopted a
gradient boosting ensemble model using XGBoost and developed a powerful tool
for effective root cause diagnosis. XGBoost is applicable to both
classification and regression problems and is particularly characterized by
excellent performance for a variety of data sizes and types.
XGBoost features
provide excellent performance for a variety of data sizes and types. This is
very useful as we deal with large amounts of network issue data from a variety
of environments. The model learns useful patterns from large amounts of data
and can effectively identify root causes. Additionally, XGBoost uses parallel
processing and optimized data structures to provide fast learning and
prediction speeds. This is a big advantage of XGBoost's fast learning and
prediction capabilities to quickly respond to various problems that occur in
large-scale networks.
For root cause
diagnosis, prioritizing each characteristic and creating each XGBoost model
based on this increases the interpretability of the model and provides a clear
understanding of the characteristics of each root cause. Through these feature
priorities, the model learns the importance of features for each root cause and
enables effective response to the complexity of the problem.
In this way, our
model using XGBoost combines strong prediction ability with fast response speed
to achieve efficient root cause diagnosis.
Leaning and evaluation
process
Figure 4 shows the learning and
evaluation process. First, the data is purified through data preprocessing.
Duplicate data is removed, and outliers and missing data are replaced with the
most prevalent value in each data set. Lastly, for data imbalance,
down-sampling using Euclidean distance calculation and up-sampling using SMOTE
are performed to balance the data.
In the Euclidean distance calculation method, the distance of points other than the root cause label is compared one-to-one based on the center point in the distribution of the root cause label. After comparison, points that are judged to be too far from the distribution of the root cause label are removed, leaving only the points that exist as close to the boundary as possible. The SMOTE method synthesizes adjacent minority class samples between majority class samples. This increases the number of samples of minority classes, helping the model learn better and recognize minority classes better.
Figure4. Training process and evaluation process
After performing this
preprocessing process, an XGBoost model according to each priority is created
and trained and verified. During verification, a grid search technique is used
to find the most appropriate hyperparameters for the model. If you input the
data set you want to test into the learned RCA solution, you can get results
where the root cause label predicted by the model matches the actual root cause
label.
The evaluation results are as
follows: It was derived as the average accuracy value of each model.
| Total count | Success count | Fail count | Accuracy |
5G-NR PS | 27365 | 24243 | 3122 | 88.59% |
5G-NR Voice | 29903 | 25376 | 4527 | 84.86% |
LTE PS | 90505 | 82813 | 7692 | 91.50% |
LTE Voice | 29501 | 26958 | 2543 | 91.38% |
Total | 177274 | 159390 | 17884 | 89.91% |
XCAP-Cloud with AI-powered
RCA
AI-powered RCA is provided by
XCAP-Cloud, a cloud-based mobile network analysis solution. We use test
equipment to collect data generated from the telecommunications carrier's
mobile communication network. Collected data is uploaded to the server in log form.
Users can define rules to identify specific patterns in logs and send them to
the AI model when those patterns are found. Logs must be interpreted before
being fed into an AI model. Log interpretation involves understanding the
contents of the log and extracting KPIs. Interpreted logs and extracted KPIs
are sent to the AI model through grpc. grpc is a protocol for efficient and
reliable data transmission. ARCA infers the root cause based on the received
data.
Inference results can be checked
using various visual tools provided by XCAP-Cloud. Visual tools help you
intuitively understand inference results.
Figure6. XCAP-Cloud
The accuracy of AI prediction results extracted in real time from the XCAP-Cloud system equipped with an AI-powered RCA solution was observed to be 97%.
Figure7. Real-time AI prediction accuracy
Use Cases
All issues that may arise in the
5G/LTE environment are categorized and managed through RCA, and each event is
labeled to help easily identify the root cause.
VoNR Call Setup Failure Case
Voice services using 5G RAN, 5G
Core, and IMS are called Voice over New Radio-VoNR. NR UEs can perform voice
services directly on the NR network without falling back to the LTE network.
VoNR Call Setup Failure can occur for a variety of reasons. In the initial
network construction stage, Cell Search failure, PDCCH Decoding error, IMS
Registration failure, etc. are the main causes that can cause problems in which
the terminal cannot connect to the network or register with the IMS server.
This solution can quickly classify the problem by extracting the cause of cells
that cause many setup failures.
VoNR Call Drop Case
As a mobile communication
system, even if the initial setup is successful and the call is connected
normally, a call drop may occur when entering the cell edge or when handover
occurs due to RF deterioration or when the settings of the source cell and target
cell cannot maintain connectivity. can. In addition, even though information
about neighboring cells is searched periodically, when a call must be continued
without finding a suitable cell, a large amount of RTP packets is lost, and the
network reclaims radio link resources to cause a call drop. can. This solution
can help you accurately analyze call drops. By additionally checking the packet
data and Layer3 messages provided within XCAP-Cloud, you can quickly identify
problems and take appropriate action.
Figure8. RCA Workflow
NR FTP Low Throughput Case
NR FTP Low
Throughput Case
After the
normal call setup process is performed in 5G NR, you may experience quality
issues in data calls such as FTP and HTTP with processing speeds lower than expected.
Typically,
degraded RF performance may indicate low throughput, while normal RF
performance may indicate parameters related to throughput and capacity. It can
be caused by various reasons such as UL/DL bandwidth, MCS, Layer, Rank Index,
etc.
This solution
can help with accurate analysis by extracting the cause of low RB allocation
that occurs in the network and the resulting low throughput cases.
Future Directions
We aim to provide intuitive and
versatile solutions to diagnose wireless network problems quickly and
accurately.
A successful AI model must not
only achieve high accuracy, precision, and recall, but also ensure reliable
prediction performance at a level that actual customers can use as wireless
network analysis indicators. To achieve this, AI models must learn the know-how
of highly skilled wireless network analysis experts and continuously evolve.
Our goal of a wireless network
analysis know-how training system is to build a customer-tailored AI model
learning infrastructure to provide a system that allows customers to directly
discover data sets and upgrade AI models. Through this, concerns about personal
information and data leaks can be resolved, and AI models that meet customer
needs can be built more effectively.
In addition, if only data from the wireless network connection section of the mobile and base station is used, the root cause of the failure due to problems with the upper layer access probe or core probe may be unclear. Therefore, there is a need to develop it into a comprehensive learning model from end to end. We will continue to take on this challenge without stopping.
Figure9. Customer-tailored AI model learning system