Modeling Customer Experience in a Contact Center through Process Log Mining

The use of data mining and modeling methods in service industry is a promising avenue for optimizing current processes in a targeted manner, ultimately reducing costs and improving customer experience. However, the introduction of such tools in already established pipelines often must adapt to the way data is sampled and to its content. In this study, we tackle the challenge of characterizing and predicting customer experience having available only process log data with time-stamp information, without any ground truth feedback from the customers. As a case study, we consider the context of a contact center managed by TeleWare and analyze phone call logs relative to a two months span. We develop an approach to interpret the phone call process events registered in the logs and infer concrete points of improvement in the service management. Our approach is based on latent tree modeling and multi-class Naïve Bayes classification, which jointly allow us to infer a spectrum of customer experiences and test their predictability based on the current data sampling strategy. Moreover, such approach can overcome limitations in customer feedback collection and sharing across organizations, thus having wide applicability and being complementary to tools relying on more heavily constrained data.


INTRODUCTION
The high competition in present business systems requires efficient and adaptive customer relationship management (CRM) strategies that are able to cope with the challenges of the current information era. Customer experience is today recognized as a dynamic and complex phenomenon, also thanks to the hugely increased rate of information exchange worldwide [24]. Customers' perceptions and evaluations indeed change over time and are influenced by a variety of factors, including the experience of other customers. Organizations are therefore increasingly interested in the implementation of analytical tools that can routinely monitor their interactions with the customers and predict their future trends.
Technological development made possible the collection of large amounts of information, opening the door to quantitative CRM data science [14,21]. In this context, contact centers are one of the main modality for customer-to-organization interactions and thus hold a key importance for CRM. At the same time, customer experience is becoming a core reference for companies and service providers, yet its prediction and interpretation is still a complex problem [35]. Such complexity arises from the multiple factors that jointly impact the customer experience, such as the dialog with the agent or the communication capacity of the contact center, and to the varying importance of such factors from organization to organization. For this reason, artificial intelligence is regarded as an essential component for process maintenance and optimization in contact centers, with great emphasis on natural language modeling [18]. As a result, recent research has widely focused on oral and written conversations between customers and agents [4,7,23,28,37,38].
However, although spoken and written language analysis is highly attractive for organizations at large, in practice they often have to face many challenges before considering this tool, including resource, legislative, and organizational barriers. For instance, growing concerns over privacy protection may represent a serious obstacle to data collection and management even for single organizations. Even when direct customer data (i.e., voice recording or feedback) can be collected and analyzed by an organization, middleware service providers can only access this information at discretion of the client organization. Moreover, from an individual caller's point of view, the feedback he/she provides mostly reflects the service enjoyed with the call center of the service provider (i.e., how professional the call center agent handles his/her problem), not the service quality from the infrastructure service provider. On the other hand, while direct customer feedback is increasingly being used as an indicator of experience quality, there exist also indirect forms of feedback that can be gained from data. Thus, in principle, it is possible to infer the degree of satisfaction even if this is not directly provided by the customer. For instance, log-based process performance analysis has recently received impulse through various studies that characterized different aspects of process performance, also in relation to external factors and collaborative processes [27].
In this context, process logs represent an attractive resource, as they are an ubiquitous output of any communication channel and may often be the only data available for systematic investigations. Even if this type of data provides an indirect measurement of customer experience, its largescale availability could allow infrastructure service providers to deploy informative systems which could better support the analytics strategy of client organizations. By offering domain knowledge injected and explainable data-driven models, end service providers could utilize those models as a supporting component of their own data analytics system, thus maximizing the value of contact center data. However, log-based modeling of customer experience remains largely unexplored.
The central objective of this study was to develop a methodology for interpreting log data of contact centers and for predicting associated customer experience. The principal challenge was the lack of knowledge on the true satisfaction related to available process realizations, which hinders direct supervised analysis. The final goal was to understand how customer experience information can be collected and managed across touch points and over time. In particular, we focused on log data of telephone calls, which remain the primary form of communication in contact centers [35]. Together, these elements constitute an unexplored setting where traditional approaches are not directly applicable. To tackle the problem, we thus combined unsupervised and supervised data mining and modeling techniques, seeking complementary indicators to formulate a consistent final interpretation. Our approach is based on a middleware service provider standpoint, which enabled us to collect a large number of customer interactions from clients. Such scores have the potential to give an overall picture of the customer experience being delivered, both in real-time and historically. Also, from an application point of view, such analysis can allow service providers to perform inference on the underlying layers of the data, to suggest where customer experience can be improved, either for middleware service providers or individual organizations.
Our work reveals that popular data mining methods such as k-means clustering and elastic net fail to provide meaningful interpretations of log-based customer experience. On the other hand, application of latent tree modeling allowed us to characterize the log process in terms of different aspects of customer experience, here termed facets [31]. Such a modeling framework was indeed chosen due to its interpretative advantages and its elevated flexibility compared to ordinary Bayesian networks. By reconstructing and linking together latent facets in the log data, we could identify concrete points of intervention for improving customer experience. Moreover, we verified the predictability of the different aspects of customer experience via a multi-class supervised approach, exploiting extracted facet qualities as target labels. In summary, our contributions are the following: • We built and validated a methodology for log-based modeling of customer experience that can cope with the absence of ground truth feedback from the customers. • We demonstrate the utility of log data analysis and modeling in supporting contact center management, suggesting that practical advantages could be obtained through the deployment of online prediction tools as a component of existing call management systems. • From an application point of view, we introduce an approach that can be implemented from middleware service providers, is widely applicable to a variety of contact center settings and can integrate client service providers' tools.
This article is organized as follows: In Section 2, we outline previous studies that investigate customer experience by means of data mining and modeling approaches. In Section 3, we outline the approach proposed in this work and describe the computational techniques used, while experimental outcomes are presented and discussed in Section 4. Lastly, in Section 5, we recapitulate the achievements and the significance of our work for CRM research.

RELATED WORK
A vast literature exists focusing on data mining strategies for CRM, dedicated to identify, attract, maintain and develop customers, with the largest fraction targeted on customer retention [19,21,29]. In contact centers, communication can take place through a range of channels such as telephone, e-mail, forums or online live chats. Phone call conversations constitute the main source of information, and approaches for customer experience prediction based on those have been recently investigated. Most studies concentrate on automatic customer satisfaction analysis, emotion detection, problematic call detection, and call segmentation [35]. Unstructured text or conversation data present the additional challenge of being converted to processable information. To that end, embedding semantic knowledge in sequential word data based on information gain has been shown to provide competitive and interpretable models while alleviating the burden of data preparation [7]. On the other hand, raw conversation data were successfully used to predict customer satisfaction by means of artificial neural networks [37]. More recently, the same task was investigated by combining entire phone call conversations and turn-level features [4,23]. The ideal end stage for all these methods is to find application in end-to-end systems [33]. In parallel, phone call data was used to automate and optimize other tasks including customer recognition [34], customer routing [17], adequately pairing customer and agent based on historical data [26] and evaluating the work carried out by agents [15]. In other types of channels data mining has been studied for instance in the context of chat routing modeling [20], emotion prediction from chats [28] and video chats [38].
However, conversation data may not be available depending on the organization. In some cases, only call logs may be provided-in the form of time-stamp entries-as contact centers are always supported by IT systems that log their execution. In these cases, contact center tasks and goals cross with process mining techniques. These have been used to evaluate time consumption of the process stages like specific activities or waits, to gain insights in resource allocation and workload management or to assess the service quality [27].
A further possibility is to combine heterogeneous data deriving from different communication channels, when available. As an example, Bockhorst et al. [6] have recently developed a comprehensive system that aggregates heterogeneous data sources such as dialog transcriptions, call logs, client, and policy data in order to predict customer satisfaction. The study could also benefit from self-reported satisfaction scores obtained from a third-party company.
In general, all of the previous approaches for predicting customer experience require ground truth feedback to be evaluated and implemented. In the frequent scenario, where this is not available, a principled approach could be adopted. For instance, Fiedler et al. [11] proposed a general relationship linking quality of experience and quality of service in communication services . However, this method overlooks all the specific aspects of the communication process, which are often critical for its optimization. To the best of our knowledge, no study has addressed data-driven modeling of customer experience starting uniquely from log data, which is the challenge targeted here. The settings of our case study are thus distinct from previous works and as such required the development of novel solutions, which are described in the following section.

METHODS
In this section, we describe the ideas underlying our work and the computational techniques composing our customer experience modeling approach, which include a specific type of latent tree modeling and Naïve Bayes classification. Next, in Section 3.5 and 3.6, we briefly describe k-means clustering and elastic net regression, used as benchmark methods.

Proposed Approach
The central goal of this work is to develop an approach to predict the quality of customer experience in a contact center starting uniquely from phone call logs. In ideal settings, organizations can directly or indirectly sample satisfaction rates to improve their services, possibly through proposed data mining techniques (see Section 2). We instead concentrated on the situation where no customer feedback is obtainable and investigated how to infer customer experience from available log-based data. This task can potentially be applied to a number of scenarios, yet it is generally overlooked in the literature. Here, we concentrated on the case of log data generated in a contact center during phone call processes.
We elaborated a strategy that could recognize and describe the various sub-processes underlying phone calls, in order to extract hidden quality levels of customer experience from the data itself. The central step here concerns the construction of a pouch latent tree model (PLTM) [31] of phone calls in terms of their temporal properties. Rather than identifying a global clustering across all input (manifest) variables, a PLTM embeds alternative data clustering solutions associated to some disjoint subsets. As explained in detail in Section 3.2, each of these clustering variants is represented by a discrete latent variable and defines a Gaussian mixture model (GMM) over its subset of manifest variables. Within a PLTM, manifest and latent variables are in conditional relationship based on a Bayesian tree among internal and terminal nodes. A PLTM thus provides alternative data groupings and their probabilistic relationship, reflecting multiple data partitions examined at the same time. In other words, this model identifies different facets that jointly characterize a dataset, allowing the domain expert to evaluate all solutions provided. By considering complementary data facets, PLTMs therefore provide significant margin for expert interpretation and evaluation. In the case of process logs, some facets may indeed expose issues or critical points hidden across multiple sub-processes. Additionally, PLTMs have enhanced flexibility compared to ordinary Bayesian networks, as they can support real-valued input features such as event durations.
Upon reconstructing the classes defining the spectrum of all the likely customer experience quality levels, it is possible to build classification models that predict such classes within a supervised framework. The PLTM-combined with expert knowledge about the business domain-thus constitutes a tool to inform classification models and overcome the lack of feedback from the customer. In this second step, we considered the Naïve Bayes classifier, which is suited to a multiclass classification task with multiple customer quality levels, is computationally efficient and provides probabilistic confidence. Given these properties, this kind of model could be reasonably suited for implementation in a real-time call management system and was hence implemented in our approach.

Pouch Latent Tree Modeling
A PLTM is a rooted tree model where leaf nodes represent input (or manifest) variables and internal nodes symbolize latent variables underlying the observable data distribution [31]. Manifest variables X can either have discrete or continuous values x, while all latent variables Y are assumed to be discrete and represent data cluster labels y. Links in the tree represent conditional dependencies of each variable on its parent. Additionally, leaf nodes can contain more than one input variable and are therefore referred to as pouch nodes. The latent variable connected to each pouch thus identifies a data-partition-based on the associated manifest variables.
PLTMs are strictly related to both GMMs and Gaussian Bayesian networks. On the one hand, graphical relationships in a PLTM are almost identical to those of a Gaussian network, with the only difference being that multiple manifest variables are allowed in a single node. On the other hand, PLTMs generalize GMMs in that they incorporate multiple latent variables and their probabilistic relationships. In a GMM, the input variables x have the following distribution conditioned to cluster values y: In a PLTM, every pouch node is mapped to a disjoint subset of input features. Let us denote with Π(Y ) and π (Y ) the parent of a variable V and its value, respectively, and with W and w the variables inside a pouch and their values, respectively. The dependency of any discrete latent variable Y on its parent Π(Y ) is described by a conditional discrete distribution P (y|π (Y )), whereas manifest variables W follow a conditional Gaussian distribution P (w|y) = N (w|μ y , Σ y ) with mean μ y and covariance matrix Σ y . The probability distribution over manifest variables defined by a PLTM is 48:6 T. Fu et al.
therefore the following: We can thus see that a PLTM is a GMM, but it also provides richer information. First, it embeds alternative data clusterings based on different subsets of the input variables and identifies facets of the dataset. Second, it describes conditional dependencies among these facets. Data clustering based on PLTM entails two main steps: estimating a set of parameters θ and learning a tree structure m. The former task can be performed by computing the maximum likelihood estimate θ * through the expectation maximization algorithm [9], assuming a model structure m. The optimal PLTM structure m * is learned by maximizing the Bayesian information criterion (BIC), which is defined as follows [36]: where d (m) is the number of independent parameters in m. In this equation, the first term encourages good data fitting, whereas the second term is a regularization term penalizing complex models. Model parameters and structure are determined via a hill-climbing algorithm that iteratively generates new candidate models until the BIC is no longer improved [31]. As introduced in Section 3.1, in order to maximize the value of a dataset lacking ground truth customer feedback, a model with high explanatory power is an ideal choice for our research purposes. Therefore, although other models such as artificial neural networks have recently gained popularity in the field, they were not considered in this work due to the difficulties associated with their interpretation. Graphical models, such as Bayesian networks, which capture probabilistic relationships among variables, thus became the main focus of this study. Given the multi-faceted nature of customer experience, we prioritized methods based on their flexibility and suitability to multiple variable types, hence converging on PLTMs.

Sensitivity Analysis
We evaluated the strength of the relationships inside the PLTM by means of a sensitivity analysis in its probabilistic tree. We used an approach designed for discrete Bayesian networks to determine how the probability of a variable of interest (the hypothesis variable) is affected by a variation in the value of a single parameter in the network [8]. The sensitivity is computed by taking the parameters of a second node as evidence based on the data. Given an hypothesis node h, an evidence node e, and a node parameter θ , we assumed a linear sensitivity function as follows: The sensitivity thus describes linear changes in the output probability P (h|e) for infinitesimal variations in the parameters θ . It is calculated as the derivative of the sensitivity function at the original parameter value, as follows: We performed this analysis with and without assuming evidence based on PLTM parameters [2].

Naïve Bayes Classification
The Naïve Bayes classifier (NBC) is a popular multi-class classification method that exploits a probabilistic framework and is based on Bayes' theorem [12]. Under the assumption that every pair of features in the training dataset are independent, it aims to maximize the conditional probability of any observed data sample x based on its label assignment y. Formally, the goal is to solve the following problem: where d is the number of features and the conditional probabilities P (x i |y) are assumed to be Gaussian with mean μ y and standard deviation σ y , as follows: Parameters μ y and σ y are computed via maximum likelihood estimation. The main advantages for choosing this method over other models are its probabilistic output and the computational efficiency. Unlike linear modeling, the probabilistic framework utilized by NBC in learning predictive patterns might confer advantages in terms of implementation into a usable classification system. Second, NBC only requires a small amount of training data for necessary parameter estimation, which makes it faster compared to more sophisticated methods. Decoupling the class conditional feature dependencies in fact allows independently estimating them as one-dimensional distributions. For these reasons, we employed NBC for facet-based classification.

k-means Clustering
Along with the methods presented above, which are part of the approach introduced in this work, we applied other machine learning methods in the attempt to extract useful insights from log data. The k-means approach is one of the most widespread in clustering analysis, thus we chose it as a baseline in the task of identifying customer experience classes.
Assuming to have a set of n observations {x 1 , x 2 , . . . , x n }, the k-means clustering method aims to partition these observations into k (≤ n) disjoint sets S = {S 1 , S 2 , . . . , S k } by minimizing the intra-cluster variance in the feature space [22]. The task is mathematically formulated as follows: Here, μ i represents the mean coordinates of the observations in the set S i . The problem is solved by means of the Elkan's algorithm [10] using the k-means++ initialization method [5].

Elastic Net Regression
Elastic net is a least squares method for feature selection defined on n observations {x 1 , x 2 , . . . , x n } stored in the matrix X and on a vector of associated targets y = {y 1 , y 2 , . . . ,y n } that uses a combination of L 1 and L 2 regularization [40]. The problem solved is the following: At the occurrence of any event relating to a given call, the log is updated with the time point corresponding to that event.
where α and l are hyper-parameters controlling joint and relative contributions of the L 1 and L 2 penalties, while β is a vector of linear regression coefficients. This technique generalizes the least absolute shrinkage and selection operator method and ridge regression. With respect to these methods, it allows more effective feature selection and takes into account the correlation among independent variables. This offers elastic net higher flexibility and stability compared to other linear models. For such reasons, it was selected in this work as a baseline model for predicting and characterizing early call termination.

RESULTS
In this section, we first describe the data used in our work, which are entirely sampled from a real contact center. Next, we report the results of our initial efforts to model and interpret customer experience by using popular data mining methods. In this section, we present the experimental work supporting our proposed approach.

Data
In this work, we focused on data provided by TeleWare, a middleware service provider involved in telecommunication development that operates with organizations from various sectors [3]. Here, we considered a dataset relative to a single organization and to the period between October and November 2016. The original dataset comprises time-stamp log recordings for 8,353 phone calls relatively to: call start, queue start, customer dialing, occurred or missed agent answer and call termination. The full set of data entries included in the log are shown in Table 1.
The structure of phone calls considered in this article is displayed in Figure 1. As soon as a call reaches the internal system, it is processed and assigned to a queue. This step typically occurs within the order of seconds, but occasionally it requires a longer time. Once the call is queued by the system, the customer has to select the desired service by dialing its corresponding code. Next, the system proceeds with the connection to an agent, depending on the selected service. However, the connection may fail as a result of a missed answer by the agent. This can happen when the agent is already occupied in another call or does not answer. At this point, the system tries to connect the customer to another agent, and if necessary to a third one and so forth, until a connection gets successfully established. The actual conversation can then take place, until either the customer or the agent terminates the call. However, it may also happen that the customer ends the call before being able to initiate any conversation.
Throughout the whole process, the occurring of these events and the occurrence time are recorded and stored in a database. It is therefore possible to completely characterize each call in terms of time-stamp information. Importantly, no information about actual customer satisfaction or feedback is available, as it was not sampled by the client company. This represents the main challenge in our study, as it prevents the use of supervised data analysis techniques for directly predicting customer experience. Fig. 1. Panel (a) illustrates the phone call schema for the considered contact center. Time points for each event in the boxes are automatically registered by the system as shown in Table 1 and comprise our initial dataset. Panels (b) to (i) show distributions for variables in Table 2 relative to the engineered dataset.

Feature Interpretation
CallStartTime Instant of phone call connection to the system and start of its processing ProcessingDuration Time needed to the system to assign the incoming call to a queue QueueDuration Overall queue time PreDialDuration Queue time before the customer specifies the desired service PostDialDuration Queue time after the customer has specified the desired service TalkDuration Duration of the conversation between customer and agent NumberOfDials Number of times the system tries to connect the customer to an agent CallClass Categorical variable indicating whether the connection between customer and agent occurs before the end of the call (1 = yes, 2 = no) All variables are real except NumberOfDials and CallClass, which are integers.
In order to obtain a constant number of meaningful features for each sample, we engineered the original data, thereby obtaining the set of features displayed in Table 2 along with their explanation. A single variable-CallStartTime-remains in the form of time-stamp, which was appropriately converted into a single numerical value expressed in seconds. Five other variables represent event durations within the call, corresponding to rectangular boxes in Figure 1. These are Process-ingDuration, QueueDuration, PreDialDuration, PostDialDuration, and TalkDuration. Lastly, two further variables were introduced to characterize the events that may differ from call to call: the number of dials to an agent and whether or not the call terminates before an agent can respond.  Table 1 CallID TalkDuration NumberOfDials CallClass  116539218  47091  12  282  276  6  81  1  1 As a result, for instance, a log sequence like that displayed in Table 1 is converted into the process data sample shown in Table 3. These features were used in all the subsequent analyses, described in the following sections. Unless differently stated, these were carried out in Python using the following packages: Pandas [25], Scikit-Learn [30], Matplotlib [16], and Seaborn [39].

Preliminary Data Exploration
Telecommunication, as a middleware service, is part of a network infrastructure, which means that the concepts and assumptions in network measurement can be usually applied to telecommunication. In a previous study, the authors proposed an intuitive hypothesis to map the general relationship between the quality of experience (QoE) and the quality of service (QoS) into a logarithmic relationship [11]. Adopted in a call center scenario, customer experience can be represented by the QoE while some features such as QueueDuration could represent the QoS. Embracing this idea, we initially formulated the following hypothesis: CX = −afk log(QD) + b, where CX and QD symbolize customer experience and QueueDuration, respectively. By applying such a hypothesis to our dataset, we obtain a curve with parameters a = 0.11 and b = 0.95, which reflects the intuitive assumption that the longer a customer waits in the queue system and the worse his/her experience is. The obtained relation indicates this tendency in a reasonable way, but in practice, this hypothesis has the important drawback that it only focuses the global duration of the process, overlooking the relationships between multiple features within the dataset (i.e., QueueDuration and TalkDuration). For example, consider the case of a customer calling the credit card company for a card theft issue. The customer may wait in queue for a relatively longer time, but if the agent reacts quickly and professionally, the customer could also be satisfied. However, this scenario cannot be captured by the logarithmic model.
Next, we thus tried to identify phone call groups in our dataset that may be associated to different quality levels of customer experience through data mining algorithms that could leverage the detailed log-based information. The idea is that properties of the clusters may suggest novel indicators by association with factors known to positively or negatively influence customer satisfaction.
To this end, our dataset was partitioned by k-means clustering, one of the most popular cluster analysis methods [22]. We assessed the goodness of the partition for a range of possible number of clusters k, in terms of the silhouette coefficient, which expresses the degree of similarity among observations in a same cluster compared to observations in other clusters [32]. Let us indicate with a(x) the average Euclidean distance between observation x and all other samples in the same cluster and with b (x) the smallest average distance to the samples in any other cluster. The silhouette coefficient s (x) is defined as: and ranges in the interval [−1, 1], with 1 being the maximal consistency with the clustering. As it is possible to see in Figure 2, the average silhouette across all samples is considerably low for every value of k up to 20, with only k = 2 lying above 0.30. Visual inspection of silhouette profiles and data separation confirmed that the global quality of the clustering remains modest across all values of k. For instance, Figure 2 shows that, for k = 2, a large cluster dominates over a smaller and less defined one, while Figure 3 displays the poor separation obtained.
Moreover, binary separation is not ideal for practical interpretation as customer experience hardly reduces to a sharp positive/negative response. The case with the second highest average silhouette, k = 8, would appear more meaningful, if the associated clustering quality did not remain so poor. Additionally, the obtained models do not incorporate any domain knowledge, producing difficulties in terms of interpretation. Overall, this analysis did not provide useful insights in the various forms of customer experience delivered.
As a second task, we sought to build a predictive model for early call termination to characterize the relationships with all previous process events. We employed elastic net, a regularized regression method that allows to estimate the correlation among variables [40]. In elastic net, the weight of each variable is controlled by both L 1 and L 2 penalties, whose relative contribution is balanced through two hyper-parameters α and l as shown in Equation 9. We thus built a series of elastic net models using CallClass as a target variable and studying the influence of different L 1 and L 2 ratios over an interval [0.1, 1]. We optimized the models by tuning the hyper-parameter α based on a 20-fold cross-validation and the mean squared error (MSE), defined as follows: where y i andŷ i are the predicted and true CallClass values for any observation i, respectively, while n test is the number of test observations in each test data split. Figure 4 shows results for the best-performing model, with a MSE of roughly 0.11. In panel (b), it is possible to view the associated model parameters, from which we can infer the variables most associated to early call termination.
As it could be expected, QueueingDuration is one the most correlated variables with Call-Class. Moreover, QBeforeDialingDuration appears much more relevant than QAfterDialingDuration. This suggests that the early stage of the call is highly determinant on customer's willingness to continue the wait. Conversely, CallStartTime very weakly affects CallClass and NumberofDials does not influence it at all. The moment of the day therefore does not emerge as a factor for premature call termination. Altogether, these results indicate that the service provider needs to improve the initial handling of the calls inside the system.  This analysis could thus reveal expected and unexpected patterns in the data, but is nevertheless limited in scope to the occurring of a conversation between customer and agent. Particularly, it missed the overall quality of service experienced and degree of satisfaction. We therefore moved on to a different approach, presented in the following section.

Pouch Latent Tree Modeling of Log-based Processes
To achieve a more meaningful process characterization, we next applied our proposed approach described in 3.1. We thus built a PLTM of our phone call log dataset in term of the variables described in Table 2, except CallClass that was dropped. In this process, we utilized the PLTM Java implementation made available by Poon et al [31] through the JPype interface [1]. The resulting structure is illustrated in Figure 5. As shown, four latent variables represent alternative clustering solutions, each one corresponding to a complementary data facet. Variable Y 3 is linked only to CallStartTime, so it represents a grouping of the data based on this feature alone. Its inner states are five, meaning that there are main intervals wherein the incoming calls are allocated. Variables Y 1 , Y 2 , and Y 4 are assigned pouches with two manifest features and nine data clusters each.   5. Structure of the obtained phone call PLTM. Each non-terminal node corresponds to a discrete latent variable, representing a data clustering solution based on the manifest variables associated to its child pouch node. Relations among latent variables constitute a linear Bayesian tree broadly reflecting the temporal evolution of a phone call. Table 4 displays the mean of each input feature conditional on the assigned clustering variable. Inside Y 1 , state 0 emerges as the most numerous cluster and corresponds to very short global and pre-dial queue time spans. Along with it, five states of decreasing probability (in order: 8, 1, 7, 5, and 6) describe increasingly long queue durations. Three other states identify waiting times under average (state 2) or long queue durations with very quick pre-dial times (states 3 and 4). The Columns relative to manifest attributes show the average within each cluster corresponding to a given internal state. The last row represents the average over the complete data distribution.
hidden variable Y 1 thus appear associated to the overall system efficiency in quickly providing information to the customer and handle his/her request. Inside Y 2 , there is instead more robust consensus between high-and low-manifest variables values. Clusters are divided between states with post-dial queue time and number of dials above (states 0, 2, 3, and 7) and below (states 1, 4, 5, and 8) average. Only state 6 is characterized by a Post-DialDuration slightly above than average and a NumberOfDials just below, which may highlight an anomalous duration for each connection attempt. Variable Y 2 is arguably related to customer patience or queue behavior, as its pouch contains features belonging to the end-stage queue.
Next, Y 3 is evidently related to the initiation time of the call, which dos not result directly associable to any other feature. It is banally divided into clusters where the average CallStartTime is low-such as for states 0, 3, and 4-or high like for states 1 and 2.
Lastly, Y 4 is more evenly distributed across clusters with different combinations of high and low feature values, as expected from their heterogeneous semantics. Two of the most likely states are associated to low ProcessingDuration and TalkDuration (states 2 and 4), but clusters 3, 5, 6, and 8 present mixed tendencies. Such a data partition suggests that Y 4 may represent specific services provided by the company. For example, state 4 might be related to appointment scheduling while state 5 might concern information requests, as the latter has higher TalkDuration.
Overall, the identified process facets appear reasonable, as queue-related attributes are grouped together. It is however interesting to notice that CallStartTime is alone in the pouch and that Hypothesis node Parameter Node Avg Max Min ProcessingDuration and TalkDuration are grouped together. The initiation time has therefore no evident correlation with queue behavior or dialog duration. Moreover, as clearly visible from Figure 5, the obtained PLTM has a flat structure where the latent component of the tree is represented by a chain. This indicates that simple relationships exist among the different components of any phone call. Reasonably, the sequence of latent variables largely reflects the temporal events during a call, with Y 1 being the first node followed by Y 2 and Y 4 as the last node. The PLTM structure thus supports its general soundness.
To evaluate the PLTM robustness and gain further insights from it, we applied a sensitivity estimation method for Bayesian networks [8] described in Section 3.3 and available on Bayes Server [2]. To this end, we transformed the PLTM in a purely discrete tree by pruning pouch nodes and leaving each latent node as a discrete variable whose states are associated to previously identified clusters. As a result, the new tree is a chain composed only of discrete latent variables, each one with a number of states equal to the number of clusters in the corresponding facet. Nodes Y 1 , Y 2 , and Y 4 thus have nine states while Y 3 has five. Assuming linear sensitivity relationships as in Equation (4), we estimated the impact of each latent variable on the others as by Equation (5). In this stage, we concentrated on a one-way sensitivity analysis in which a single parameter is separately varied, and we considered two different types of perturbation: with and without evidence. The results are summarized in Tables 5 and 6, respectively.
In the scenario where evidence is ignored, low-average sensitivity values can be observed over all node combinations, maintaining at or below 0.14 among non-trivial pairs. These results thus indicate that the three main call components-call start time, queue phase and conversation/service characteristics-do not have strong influence on each other. The highest sensitivity was registered for Y 2 with respect to Y 1 , consistently with the contribution for the total queue duration on Post-DialDuration. Furthermore, despite its position in the tree TalkDuration is the most insensitive variable on average. This seems to be conflict with the result of elastic net feature selection in Section 4.2, but the PLTM interpretation makes sense in practical situations. In the real world, the conversation duration depends on the requested service, thus it would not be suitable to simply assign a sample with longer conversation time to a low customer satisfaction. Conversely, CallStart-Time has relatively stronger sensitivity in terms of global and early queuing features, denoting a non-negligible-even if weak-relationships between them. Furthermore, the obtained sensitivities display symmetry between positive and negative values among long and short duration states. This symmetry indicates that call stages have a tendency to distribute either around short durations (abandoned calls or quickly deliverable services) or long durations due to both complicated customer requests and system or agent inefficiency. In fact, long conversations are more associated to long queuing times as can be seen by the higher sensitivity value between Y 1 and Y 4 compared to the other variables (although the value is still low: 0.05).
In the case, where evidence nodes are taken into account, global sensitivity increases although it remains moderately low. In particular, average sensitivity values are considerably higher for a few specific combinations of hypothesis, evidence, and parameter shown in Table 6. For instance, variable Y 3 becomes more sensitive to Y 1 and Y 4 only when evidence is in Y 4 . This suggests that specific services required by the customer may influence when he/she is going to make the call and how long he/she is willing to wait for the connection to an agent and for the fulfillment of his/her requests. Y 1 instead shows strong influence to other variables, regardless of which variable is set to be evidence node. This is probably due to Y 1 being considered as the root node of the structure. Finally, Y 4 only affects itself, independently of the evidence node. This suggests that the conversation span depends more consistently on external variables not considered here (booking, technical problems, etc.).
According to the results presented above, the main factors that affect customer experience include the following: (1) Call incoming time: Related to latent variable Y 3 ; (2) Customers' patience and willingness while queuing: Related to Y 1 and Y 2 ; (3) Services that are provided by the company: Related to Y 4 . Specific services may indeed intrinsically require more time; (4) Call center and system handling capacity: Related to all latent variables.
Clearly, factors on the customer side (1 and 2) are subject to intrinsic uncertainty and randomness, while factors from the company side (3 and 4) can be controlled and improved.

Facet-based Classification of Customer Experience
In the perspective of having a predictive system used by the company, not only the customer experience has to be estimated correctly, but the estimate has to be provided quickly. For instance, the system could provide a response to contact center agents immediately after a call ends as a feedback on the delivered service. In such a case, predictive algorithms need to be efficient enough to not delay the start of the following tasks. However, computation of a PLTM is too expensive in these situations, since both parameters and structure have to be re-calculated at every change in the training set. Here, we considered a simple process, but time can swiftly increase with the ramification of sub-processes. First, the PLTM is determined using a greedy search strategy which computes every possible tree based on the seed structure that is generated in the previous iteration. The search algorithm performs operations like adding, removing, and relocating states and nodes, thus recomputing all the associated parameters through a computationally expensive expectation maximization procedure. Second, PLTM inference is based on the junction tree algorithm for Bayesian networks, which considers every possible clique in the tree structure, thus further increasing the algorithmic complexity. When the number of variables is large, the cost in terms of computation resources (CPU) and time can therefore be very high.
For these reasons, we tested the transformation of the PLTM clustering into a classification task with a more efficient algorithm. Building on results in the previous sections, we translated each of the clustering solutions given by the PLTM into labels for separate classification tasks. Every latent state for any given facet was thereby taken as a target class for the task relative to that facet. We then simulated customer satisfaction predictions within the existing queue and data collection system in a scenario where true labels are available. In this phase, we employed NBC models under the assumption of independent features as explained in Section 3.4.
To evaluate the efficacy of each NBC, we employed a 3-fold cross-validation as follows: The entire dataset was split in three folds of equal size and with approximately the same ratios of class labels. In turn, two data folds (∼67% of the overall dataset) were used for training while the remaining fold (∼33% of the overall dataset) for testing and validation. We repeated this procedure three times, swapping the test fold at each iteration. In this way, we evaluated NBC models in the scenario relative to any clustering facet based on three test sets of roughly 2,800 samples each. Prediction performance was assessed based on popular measures for multi-class classification: accuracy and Matthews correlation coefficient. Assuming that n test observations are classified into K classes, we indicate with C the confusion matrix where any entry C kl represents the number of observations belonging to class k and assigned to class l by the model. In a multi-class scenario, the accuracy is defined as the global ratio of correctly assigned observations to the true classes, as follows: The accuracy thus reflects the ratio of correct predictions disregarding the number and ratios of considered classes. The Matthews correlation coefficient is instead a measurement of agreement between real and predicted values that takes into account the proportion of class cardinalities. Generalizing the binary label case, the multi-class Matthews coefficient R K is defined as follows [13]: 0.92 ± 6.9·10 −3 0.90 ± 8.0·10 −3 For each scenario, average accuracy and Matthews correlation coefficient R K in 3-fold cross validation, as defined in Equations (12) and (13). Based on this equation, R K ∈ [−1, 1] and values close to 1 are obtained only if elements from all classes are recognized correctly. It is therefore a precise indicator of good prediction balancing the contribution of every misclassification possibility. Together, accuracy and R K provide a complete overview of classification performance. Overall, classification performances maintain high across all scenarios, as summarized in Table 7. Both accuracy and Matthews coefficient result steadily close to 0.90 across test sets with deviations within 1%. Figure 6 illustrates the global counts of correctly and incorrectly classified observations across all classes. Manual inspection of misclassified samples allowed us to verify that they belong to statistically similar clusters. These results therefore support the use of automatic prediction methods within the call management system, possibly as online classification tools.

CONCLUSIONS
In this study, we investigated how to leverage process log data to characterize and achieve insights into customer experience within a contact center. To develop and validate our methodology, we focused on the application of data mining and modeling techniques on real-world call center data obtained from the TeleWare database, consisting of log timing information for thousands of calls and lacking customer feedback on the service received. Since traditional clustering algorithms such as k-means clustering could not provide reasonable and meaningful interpretations, we used a PLTM to determine alternative data partitions reflecting complementary aspects. The obtained PLTM combined with domain knowledge allowed us to characterize the variation in log-based processes and hypothesize different degrees of customer experience. Subsequent classification experiments verified the efficient predictability of customer experience facets based on log data, potentially implementable as a support system.
Customer experience is a complex and multifaceted phenomenon which requires non-obvious strategies to be understood and managed. Our findings suggest that taking into account the different aspects in a certain process can provide more meaningful insights compared to mining process data as a whole. Moreover, given the wide and diversified range of business processes, it is essential to develop and validate computational techniques that can leverage the available data. Our results demonstrate that application of PLTM can be useful in contexts where customer self-reported feedback is not available to identify concrete points of improvement in CRM. We therefore envisage that our approach can be extended to other log-based processes.
Although the obtained results demonstrate the value of a PLTM in the considered industrial environment, there are issues yet to solve before considering it usable in a realistic setting. The main obstacle is the computational cost of the algorithms for PLTM construction. For this reason, we studied the possibility of performing customer experience prediction through a more efficient supervised method. Our results support the development of a two-stage strategy with a preliminary phase of facet labels identification and a real-time phase of multi-label classification. In future research we will concentrate on a redevelopment of PLTM to achieve a higher computation efficiency and to reduce the time cost in model training. To address this issue, we see artificial neural networks as a promising option.