XLOG: Analysis of large and complex logs

XLOG Publications and Supporting Materials

ASE'14: The Confidence in Our k-Tails
ASE'15: Have We Seen Enough Traces?
ICSE'16: Behavioral Log Analysis with Statistical Guarantees
FSE'18: Using Finite-State Models for Log Differencing
ASE'19: Size and Accuracy in Model Inference
ASE'19: Statistical Log Differencing

The Confidence in Our k-Tails

k-Tails is a popular algorithm for extracting a candidate behavioral model from a log of execution traces. The usefulness of k-Tails depends on the quality of its input log, which may include too few traces to build a representative model, or too many traces, whose analysis is a waste of resources. Given a set of traces, how can one be confident that it includes enough, but not too many, traces? While many have used the k-Tails algorithm, no previous work has yet investigated this question.

In this paper we address this question by proposing a novel notion of log completeness. Roughly, a log of traces, extracted from a given system, is k-complete, iff adding any new trace to the log will not change the resulting model kTails would build for it. Since the system and its full set of traces is unknown, we cannot know whether a given log is k-complete. However, we can estimate its k-completeness. We call this estimation k-confidence.

We formalize the notion of k-confidence and implement its computation. Preliminary experiments show that k-confidence can be efficiently computed and is a highly reliable estimator for k-completeness.

H. Cohen and S. Maoz , The confidence in our k-Tails. Proc. of ASE 2014.

Supporting materials

CVS client logs

Have we seen enough traces?

Dynamic specification mining extracts candidate specifications from logs of execution traces. Existing algorithms differ in the kinds of traces they take as input and in the kinds of candidate specification they present as output. One challenge common to all approaches relates to the faithfulness of the mining results: how can we be confident that the extracted specifications faithfully characterize the program we investigate? Since producing and analyzing traces is costly, how would we know we have seen enough traces? And, how would we know we have not wasted resources and seen too many of them?

In this paper we address these important questions by presenting a novel, black box, probabilistic framework based on a notion of log completeness, and by applying it to three different well-known specification mining algorithms from the literature: k-Tails, Synoptic, and mining of scenario-based triggers and effects. Extensive evaluation over 24 models taken from 9 different sources shows the soundness, generalizability, and usefulness of the framework and its contribution to the state-of-the-art in dynamic specification mining.

H. Cohen and S. Maoz , Have we seen enough traces?. Proc. of ASE 2015.

Supporting materials

Shopping cart logs Example 1
CrossFTP server logs Example 2
ZipOutputStream model and log Running example
Evaluation models
Evaluation models documentation [pdf]
Evaluation results
Expectation and convergence of log confidence [pdf]
Log completeness framework binaries + evaluation source code

Behavioral Log Analysis with Statistical Guarantees

Scalability is a major challenge for existing behavioral log analysis algorithms, which extract nite-state automaton models or temporal properties from logs generated by running systems.

In this paper we present statistical log analysis, which addresses scalability using statistical tools. The key to our approach is to consider behavioral log analysis as a statistical experiment. Rather than analyzing the entire log, we suggest to analyze only a sample of traces from the log and, most importantly, provide means to compute statistical guarantees for the correctness of the analysis result. We present the theoretical foundations of our approach and describe two example applications, to the classic k-Tails algorithm and to the recently presented BEAR algorithm. Finally, based on experiments with logs generated from real-world models and with real-world logs provided to us by our industrial partners, we present extensive evidence for the need for scalable log analysis and for the effectiveness of statistical log analysis.

N. Busany and S. Maoz , Behavioral Log Analysis with Statistical Guarantees. Proc. of ICSE 2016.

Using Finite-State Models for Log Differencing

Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest.

In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails.

We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments.

H. Amar, L. Bao, N. Busany, D. Lo, and S. Maoz, Using Finite-State Models for Log Differencing. Proc. of FSE 2018.

Supporting materials

Prototype tool, proofs, and additional evaluation data

Size and Accuracy in Model Inference

Many works infer finite-state models from execution logs. Large models are more accurate but also more difficult to present and understand. Small models are easier to present and understand but are less accurate.

In this work we investigate the tradeoff between model size and accuracy in the context of the classic k-Tails model inference algorithm. First, we define mk-Tails, a generalization of k-Tails from one to many parameters, which enables fine-grained control over the tradeoff. Second, we extend mk-Tails with a reduction based on past-equivalence, which effectively reduces the size of the model without decreasing its accuracy.

We implemented our work and evaluated its performance and effectiveness on real-world logs as well as on models and generated logs from the literature.

N. Busany, S. Maoz, and Y. Yulazari, Size and Accuracy in Model Inference. Proc. of ASE 2019. To appear.

Supporting materials

Prototype tool, proofs, and additional evaluation data

Statistical Log Differencing

Recent works have considered the problem of log differencing: given two or more system's execution logs, output a model of their differences. Log differencing has potential applications in software evolution, testing, and security.

In this paper we present statistical log differencing, which accounts for frequencies of behaviors found in the logs. We present two algorithms, s2KDiff for differencing two logs, and snKDiff, for differencing of many logs at once, both presenting their results over a single inferred model. A unique aspect of our algorithms is their use of statistical hypothesis testing: we let the engineer control the sensitivity of the analysis by setting the target distance between probabilities and the statistical significance value, and report only (and all) the statistically significant differences.

Our evaluation shows the effectiveness of our work in terms of soundness, completeness, and performance. It also demonstrates its effectiveness compared to previous work via a user-study and its potential applications via a case study using real-world logs.

L. Bao, N. Busany, D. Lo, and S. Maoz, Statistical Log Differencing. Proc. of ASE 2019. To appear.

Supporting materials

Prototype tool, proofs, and additional evaluation data