[go: up one dir, main page]

0% found this document useful (0 votes)
90 views5 pages

S3M Siamese Stack (Trace) Similarity Measure

This paper proposes S3M, a deep learning model that uses a biLSTM encoder and fully-connected classifier to compute stack trace similarity for crash report deduplication. Existing approaches rely on string and information retrieval techniques, but deep learning could better capture frame context and semantic similarity. The paper presents S3M as the first approach applying deep learning to stack trace-based report deduplication and evaluates it against state-of-the-art baselines on open and proprietary datasets.

Uploaded by

Sarah Mt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views5 pages

S3M Siamese Stack (Trace) Similarity Measure

This paper proposes S3M, a deep learning model that uses a biLSTM encoder and fully-connected classifier to compute stack trace similarity for crash report deduplication. Existing approaches rely on string and information retrieval techniques, but deep learning could better capture frame context and semantic similarity. The paper presents S3M as the first approach applying deep learning to stack trace-based report deduplication and evaluates it against state-of-the-art baselines on open and proprietary datasets.

Uploaded by

Sarah Mt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

S3M: Siamese Stack (Trace) Similarity Measure

Aleksandr Khvorov Roman Vasiliev George Chernishev


JetBrains, ITMO University JetBrains Saint-Petersburg State University
Saint-Petersburg, Russia Saint-Petersburg, Russia Saint-Petersburg, Russia
aleksandr.khvorov@jetbrains.com roman.vasiliev@jetbrains.com g.chernyshev@spbu.ru

Irving Muller Rodrigues Dmitrij Koznov Nikita Povarov


Polytechnique Montreal Saint-Petersburg State University JetBrains
Montreal, Canada Saint-Petersburg, Russia Saint-Petersburg, Russia
irving.muller-rodrigues@polymtl.ca d.koznov@spbu.ru nikita.povarov@jetbrains.com
arXiv:2103.10526v1 [cs.SE] 18 Mar 2021

Abstract—Automatic crash reporting systems have become priority. Since such reports lack textual content, conven-
a de-facto standard in software development. These systems tional deduplication techniques (for example, see surveys in
monitor target software, and if a crash occurs they send details studies [1]–[4]) are inapplicable. Thus, the stack trace-based
to a backend application. Later on, these reports are aggregated
and used in the development process to 1) understand whether report deduplication problem gives rise to a separate class
it is a new or an existing issue, 2) assign these bugs to of approaches that address it with a similarity measure; and
appropriate developers, and 3) gain a general overview of the having a good similarity measure has a massive impact on the
application’s bug landscape. The efficiency of report aggregation quality of report bucketing. This directly affects the quality of
and subsequent operations heavily depends on the quality of the decision-making in the software development process.
report similarity metric. However, a distinctive feature of this
kind of report is that no textual input from the user (i.e., bug This problem has been intensively studied for more than a
description) is available: it contains only stack trace information. decade, spawning a significant number of papers. Existing ap-
In this paper, we present S3M (“extreme”) — the first proaches rely on two crucial ideas: string similarity (e.g. stack
approach to computing stack trace similarity based on deep frame-based Levenshtein distance) [5]–[8] and information
learning. It is based on a siamese architecture that uses a biLSTM
retrieval approaches (e.g. TF-IDF applied to stack frames) [9]–
encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach [11].
over the state-of-the-art on both open-sourced data and a private Recently, Deep Learning (DL) techniques have been suc-
JetBrains dataset. Additionally, we review the impact of stack cessfully adopted for the classic report deduplication prob-
trace trimming on the quality of the results. lem [12]–[15]. However, to the best of our knowledge, there
Index Terms—Crash Report, Stack Trace, Deduplication, Au-
tomatic Crash Reporting, Deep Learning. are still no approaches that apply DL to stack trace-based
report deduplication. At the same time, DL techniques look
I. I NTRODUCTION promising for three reasons:

Collection of bug reports is an essential part of the software 1) While processing a stack trace, DL techniques can
development process. Bug trackers allow programmers to take into account frame context (other frames of the
efficiently concentrate their efforts by prioritizing urgent bugs, stack trace). This means that neural networks can, for
discern between new and old bugs, select an appropriate example, “assess” the fifth frame while “thinking” of
developer for a specific bug, and so on. Overall, bug trackers the first one. Existing (classic) methods for stack trace-
are a crucial software development tool that greatly improves based report deduplication are not capable of this as they
the efficiency of bug fixing. process stack frames independently.
Usually, a bug report contains a textual description of the 2) Embeddings can allow neural networks to infer semantic
bug, and some categorical metadata such as product version, similarity of methods in stack traces. For example, a
OS, severity, status etc. Classic bug reports are filled in man- model can “understand” that methods get() and getAll()
ually, either by users or product testers. A different approach are related to each other.
to bug reporting is an automatic crash reporting system that 3) Neural networks do not require manual feature engineer-
monitors the target software. If a crash occurs, this system ing; instead, they are able to perform feature extraction
forms and sends a report to a backend application. Unlike in an automatic and trainable way. Therefore, classic
classic bug reporting tools, they require no input from a user, approaches are limited to features that were built-in by
containing only a stack trace. their designers, while DL approaches can potentially
Such an approach allows to drastically increase the amount learn more complex ones, thus resulting in increased
of bug feedback at hand, but it is more prone to producing performance and reduced dataset dependence.
duplicates, thus making deduplication a problem of high In this paper we present our model — S3M (pronounced
Result Legend

Linear(ReLU(Linear)) Action

Data
features (v1, v2) = (|v1-v2|, (v1+v2)/2, v1ꙩv2)
Component
feature vector constructor

v1, vector representation of StackTrace1 v2, vector representation of StackTrace2

LSTM LSTM LSTM ... LSTM LSTM LSTM LSTM ... LSTM
biLSTM

biLSTM
Same
network
LSTM LSTM LSTM ... LSTM LSTM LSTM LSTM ... LSTM

frame1 frame2 frame3 ... framen frame1 frame2 frame3 ... framem

tokenizing & trimming tokenizing & trimming

frame1 frame2 frame3 ... framen frame1 frame2 frame3 ... framem

StackTrace1 StackTrace2

Figure 1. Approach Architecture

“extreme”) — the first DL approach to calculating stack trace Finally, both Moroo et al. [18] and Vasiliev et al. [19]
similarity for crash report deduplication. It is based on a describe the most recent methods, which are both based on the
siamese architecture that uses a biLSTM encoder and two idea of combining TF-IDF and string matching approaches.
fully-connected layers with ReLU activation as classifier.
III. A PPROACH
We experimentally evaluate our model on an open-source
dataset and private JetBrains data. Preliminary experiments Our approach relies on LSTM (Long Short-Term Mem-
demonstrate the superiority of our approach over classic (non- ory) — a recurrent neural network. The idea is to represent
DL) approaches to stack trace-based report deduplication. stack traces as sequences that contain tokens from a predefined
Additionally, we demonstrate the beneficial impact of stack dictionary. This approach is similar to ones used for texts
frame trimming — a technique that we had originally proposed and allows to adopt many models that were developed in
to reduce the number of tokens in the dictionary. natural language processing domain. We employ a siamese
architecture [20] — a model in which both stack traces are
II. R ELATED W ORK encoded by a single network.
The overall scheme of the proposed algorithm is presented
Stack trace-based report deduplication is a well-established in Figure 1. First, we describe the data preprocessing and stack
area. The first studies from the mid-00’s approached the prob- frame trimming. Then, we describe the siamese LSTM that is
lem via string matching algorithms: Brodie et al. [5] adapted used to obtain the representation for a pair of stack traces.
a biological sequence searching algorithm; Modani et al. [8] Finally, the trained features are used to compute the resulting
compared edit distance, prefix match and LCS approaches; similarity.
Bartz et al. [6] presented a failure similarity classifier that
takes custom edit distance between callstacks as one of the A. Preprocessing: Tokenization and Trimming
parameters; Dhaliwal et al. [7] proposed a two-level grouping We have to transform stack traces into vectorized represen-
scheme with Levenshtein distance on the second level; Dang tations. In our work, a stack trace is represented as a sequence
et al. [16] described the ReBucket technique, in which they of frames ST = {fn , fn−1 , . . . , f1 }. In this sequence, fi is the
have employed a stack trace similarity measure based on string i-th stack frame.
matching for agglomerative hierarchical clustering. Similarly to DURFEX [17] we have decided to apply
Later, approaches that employ information retrieval tech- stack frame trimming as the preprocessing step of our ap-
niques have appeared: Lerch and Mezini [9] proposed to adapt proach. The idea is the following: suppose that we have
TF-IDF for stack frames; and Campbell et al. [10] ran a a com.intellij.psi.impl.source.P siF ileImpl.getStubT ree
comparison of various approaches to implementing TF-IDF frame. By trimming it to, for example, package level, we
methods; Sabor et al. [17] presented DURFEX, which idea is obtain com.intellij.psi.impl.source. This technique allows
to consider stack frames as N-grams of variable length after to reduce the number of tokens that we have to keep in the
trimming them to package level. dictionary. In the current work, we have decided to try different
Table I
DATASET DETAILS

JetBrains NetBeans
Train Validation Test Train Validation Test
Buckets 8769 273 916 31349 1592 5909
Reports 299444 7567 56631 39789 1976 7792
Date 08/09/2018 11/22/2018 11/29/2018 09/25/1998 03/26/2010 08/13/2010

types of trimming (e.g. function, class, package). Aside from C. Algorithm


reducing the dictionary, this approach should improve the Having obtained encoded representations of both stacks, we
quality of the similarity since the neural network will more feed them into another neural network which will predict the
frequently encounter an individual frame, thus giving it more resulting similarity:
chances to learn it.
similarity = Linear(ReLU (Linear(f eatures))).
B. Vector Representation of Stack Traces Essentially, it is a fully connected 2-layer network with
Our goal is to design a stack trace similarity measure that ReLU activation which returns the similarity of two stack
will use neural networks. This measure should be symmetric: traces. Due to space constraints, we do not give detailed
similarity(stack1 , stack2 ) = similarity(stack2 , stack1 ). descriptions of the DL concepts used in this section. However,
To ensure this property, we have decided to follow the they can be found in [26].
siamese approach and use the appropriate feature vector (see
below). In the siamese approach, a shared neural network D. Training
independently encodes each input (stack traces) into a vector We have used the RankNet loss for training in order to
representation. The siamese approach has been successfully better take into account the ranking nature of our problem.
applied in different domains [21]–[23] and it can be efficient. As a relevant answer, we took a random stack trace from a
Instead of generating stack trace representations for each correct bucket. We generated 4 non-relevant answers from a
comparison, it is possible to store them and only execute random subset made up from stack traces that belong to the
the classifier. Since representation generation is the most top 50 buckets selected by Lerch and Mezini [27] (except the
computationally expensive part of S3M, this in turn reduces correct one). Such an approach to negative sampling can train
the generation time of the recommendation lists. Additionally, the model to better discern between similar stack traces rather
using a siamese network allows to reduce the number of than just selecting a random stack trace pair. Neural network
parameters to be set during training. and embeddings were randomly initialized and trained in an
To handle sequential data, an RNN architecture called end-to-end manner.
LSTM [24] is frequently used. These networks sequentially We have used the Adam optimizer [28] with a learning rate
iterate over data items (tokens) and on each step they update of 1e − 4. Dimensions for embeddings were set as 50, and
their internal state and produce the resulting vector. Essen- the LSTM hidden size was 100. The hyperparameters were
tially, this allows the RNN to “remember” previous context selected on the validation set for trim = 0 and fixed for other
and consider it while processing the next token. However, this trim levels.
approach has an obvious drawback: it takes into account only
IV. E XPERIMENTS
tokens that were shown to the RNN before the considered
term. At the same time, subsequent terms may define the We have compared our approach with state-of-the-art meth-
correct output of the RNN. ods selected from the ones mentioned in the Related Work
To address this issue, we have decided to use the biL- section. We posed the following research questions (RQs):
STM [25] approach. It is, essentially, two LSTMs, one of RQ1: How well does our approach perform compared to
which accepts a token sequence in a direct order, and the the state-of-the-art approaches?
other in a reversed one. The resulting vector is obtained by RQ2: How is the performance of our approach impacted
concatenating their final outputs (i.e. ones that are produced by different depth (level) of stack frame trimming?
after processing the final token). Following the motivation presented in recent report triaging
Next, we construct the feature vector that will be used by studies (e.g. [14], [17], [27]), we have adopted two ranked
our neural network classifier. Similarly to study [23], we have retrieval metrics: Recall Rate (RR@k) and the MRR metric.
decided to construct is as follows: Experiments were run on two datasets: a private JetBrains
dataset, and an open-sourced one (NetBeans). The JetBrains
f eatures(v1 , v2 ) = (|v1 − v2 | , (v1 + v2 ) /2, v1 v2 ),
dataset was created from stack traces that emerge after every
where v1 = biLST M (stack1 ), v2 = biLST M (stack2 ). crash of a JetBrains product. The NetBeans dataset was
In this formula, component-wise subtraction, arithmetic mean generated from reports submitted before 2016 in the bug
computation, and multiplication is used. The thought process tracker of NetBeans’s project1 . Crash reports were created by
behind such an approach is that features should represent
differences between original vectors as well as similarities. 1 https://bz.apache.org/netbeans
Table II
R ESULTS : C OMPARISON WITH STATE - OF - THE - ART ( TOP PART ) & E FFECTS OF TRIMMING ( BOTTOM PART )

Metric JetBrains NetBeans


Method MRR RR@1 RR@5 RR@10 MRR RR@1 RR@5 RR@10
Prefix Match [8] 0,70 0,60 0,81 0,86 0,35 0,30 0,40 0,42
Brodie et al. [5] 0,70 0,61 0,81 0,84 0,46 0,38 0,55 0,59
Rebucket [16] 0,72 0,62 0,84 0,86 0,45 0,38 0,54 0,57
Lerch and Mezini [9] 0,76 0,65 0,90 0,92 0,51 0,41 0,64 0,69
Moroo et al. [18] 0,78 0,68 0,89 0,92 0,47 0,39 0,56 0,60
DURFEX [17] 0,79 0,70 0,92 0,94 0,56 0,44 0,72 0,77
TraceSim [19] 0,81 0,72 0,92 0,94 0,50 0,42 0,59 0,62
S3M (our method) 0,86 0,77 0,96 0,96 0,62 0,53 0,72 0,76
trim = 0 (function) 0,86 0,77 0,96 0,96 0,53 0,45 0,62 0,66
trim = 1 (class) 0,85 0,76 0,95 0,96 0,62 0,53 0,72 0,76
trim = 2 (package) 0,84 0,74 0,95 0,96 0,57 0,45 0,71 0,77
trim = 3 0,83 0,74 0,95 0,96 0,51 0,40 0,66 0,72

extracting stack traces from the description field and attached successful in “attracting” new stack traces (i.e. this dataset is
files in the bug reports using the regular expression proposed more “sparse”).
by Lerch and Mezini [27]. To ensure the reproducibility of Next, we have addressed RQ2. For this, we have run our
our results, we have released the NetBeans dataset2 and the approach with different levels of trimming: function, class,
implementation3 of our approach. package, and the one with depth 4. The results are presented
Our experiment methodology is as follows: we have selected in Table II, in the bottom part. It shows that trimming helps
a sequential interval of reports and performed a time-aware the NetBeans dataset, but it does not improve the results on
split [1] into three parts: train, validation, and test. The train the JetBrains data. At the same time, the NetBeans dataset
and test are classic, and validation is a special fragment of demonstrates that the optimal trimming value lies somewhere
the dataset that is used for tuning the hyperparameters of between class and package. We suppose that there is no
algorithms. For methods that do not need hyperparameter “universal” level of trimming which should be applied to all
tuning, the validation step is skipped. Train, validation, and datasets. In other words, the trimming level is a hyperparame-
test duration were 105, 7 and 35 days for the JetBrains dataset, ter of our algorithm, and it should be selected on a per-dataset
and 4200, 140, and 700 days for NetBeans. We have selected basis.
the dates as to be a multiple of a week, since it is the basis of
the work schedule which may impact report arrival rates. We V. T HREATS TO VALIDITY
have selected a smaller time frame for the JetBrains dataset Our study has the following threats to validity:
since their report arrival rate is much higher. Dataset details • Subject selection bias. The performance of machine
are presented in Table I. learning algorithms frequently depends on the data. Thus,
For RQ1, the results are presented in the top part of Table II. applying our algorithm in other projects may yield dif-
Our approach demonstrates superior results the JetBrains ferent results. We tried to mitigate this by validating it
dataset for both MRR and RR@k. For the NetBeans dataset, on several different datasets: an open-sourced project and
the outcome is a bit different: our approach wins on MRR JetBrains product data.
and RR@1-5 by a significant margin, but loses considerably on • Limited scope of application. Our approach is intended
RR@10. Therefore, our method is more preferable for building for mass-deployed applications, which defines its scope of
a fully automated deduplication system. At the same time, application: 1) we assume a substantial stream of reports,
DURFEX is more appropriate for building a recommender and 2) we assume that only stack trace information is
system. available, there is no user input, i.e. textual descriptions.
Finally, we can see that the majority of well-performing • Programming language bias. Our approach was evalu-
approaches rely on TF-IDF, and string matching approaches ated for stack traces that are produced by the JVM lan-
are considerably worse. guages. Exceptions of other languages may yield different
There is a noticeable difference in the results for all ap- results. For example, stack trace trimming may have to
proaches including S3M on JetBrains and NetBeans datasets. be altered for C++ exceptions, since C++ stack frames
This happens due to the nature of the data: the JetBrains are generally shorter. We leave extending our study to
dataset has much larger buckets — 35, 5 stack traces on other languages to future work.
average, and NetBeans has a lot of smaller ones — 1, 3
stack traces on average. Therefore, NetBeans buckets are less VI. C ONCLUSIONS AND F UTURE W ORK
In this paper we have presented the first approach to
2 https://figshare.com/articles/dataset/netbeans stacktraces json/14135003 computing stack trace similarity using deep learning tech-
3 https://github.com/akhvorov/S3M niques. Our approach is based on a siamese architecture that
uses biLSTM as an encoder and two fully-connected layers [16] Y. Dang, R. Wu, H. Zhang, D. Zhang, and P. Nobel, “Rebucket:
with ReLU activation to compute stack trace similarity. Our A method for clustering duplicate crash reports based on call stack
similarity,” ser. ICSE ’12. IEEE Press, 2012, pp. 1084–1093. [Online].
experiments have demonstrated the superiority of our approach Available: http://dl.acm.org/citation.cfm?id=2337223.2337364
over the state-of-the-art on both a private JetBrains dataset and [17] K. K. Sabor, A. Hamou-Lhadj, and A. Larsson, “DURFEX: A feature
open-source NetBeans data. We have also studied the impact extraction technique for efficient detection of duplicate bug reports,” ser.
ICSQRS ’17, 2017, pp. 240–250.
of stack trace trimming on the quality of results. [18] A. Moroo et al., “Reranking-based crash report deduplication,” in SEKE
Our future work will include: 1) a more detailed study of ’17, X. He, Ed., 2017, pp. 507–510.
alternative feature representations and architectures, effects of [19] R. Vasiliev, D. Koznov, G. Chernishev, A. Khvorov, D. Luciv, and
N. Povarov, TraceSim: A Method for Calculating Stack Trace Similarity.
trimming, and stack traces of languages other than Java, and 2) New York, NY, USA: Association for Computing Machinery, 2020, p.
the application of transformer networks like GPT-3 and BERT. 25–30. [Online]. Available: https://doi.org/10.1145/3416505.3423561
[20] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric
discriminatively, with application to face verification,” in Proceedings
ACKNOWLEDGMENTS of the 2005 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01, ser. CVPR
We would like to thank Anastasia Miller for her invalu- ’05. USA: IEEE Computer Society, 2005, p. 539–546. [Online].
able comments. Furthermore, we would like to thank Anna Available: https://doi.org/10.1109/CVPR.2005.202
[21] L. Poddar, L. Neves, W. Brendel, L. Marujo, S. Tulyakov, and
Smirnova for her extensive help in the preparation of the P. Karuturi, “Train one get one free: Partially supervised neural
present article. network for bug report duplicate detection and clustering,” in
Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
R EFERENCES Technologies, Volume 2 (Industry Papers). Minneapolis, Minnesota:
Association for Computational Linguistics, Jun. 2019, pp. 157–165.
[1] A. Sarkar, P. C. Rigby, and B. Bartalos, “Improving bug triaging with [Online]. Available: https://www.aclweb.org/anthology/N19-2020
high confidence predictions at ericsson,” in 2019 IEEE International [22] A. Severyn and A. Moschitti, “Learning to rank short text pairs
Conference on Software Maintenance and Evolution (ICSME), 2019, with convolutional deep neural networks,” in Proceedings of the 38th
pp. 81–91. International ACM SIGIR Conference on Research and Development
[2] A. Hindle and C. Onuczko, “Preventing duplicate bug reports by in Information Retrieval, ser. SIGIR ’15. New York, NY, USA:
continuously querying bug reports,” Empirical Software Engineering, Association for Computing Machinery, 2015, p. 373–382. [Online].
Aug 2018. Available: https://doi.org/10.1145/2766462.2767738
[3] M. S. Rakha et al., “Revisiting the performance evaluation of automated [23] Y. Homma, S. Sy, and C. Yeh, “Detecting duplicate questions with deep
approaches for the retrieval of duplicate issue reports,” IEEE Trans. on learning,” http://stuartsy.com/quora.pdf, [Online; accessed 16-March-
Soft. Eng., vol. 44, no. 12, pp. 1245–1268, Dec 2018. 2021].
[4] C. Sun, D. Lo, S. Khoo, and J. Jiang, “Towards more accurate retrieval [24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
of duplicate bug reports,” ser. ASE ’11, Nov 2011, pp. 253–262. Comput., vol. 9, no. 8, p. 1735–1780, Nov. 1997. [Online]. Available:
[5] M. Brodie et al., “Quickly finding known software problems via auto- https://doi.org/10.1162/neco.1997.9.8.1735
mated symptom matching,” ser. ICAC’05, June 2005, pp. 101–110. [25] A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional lstm net-
[6] K. Bartz et al., “Finding similar failures using callstack similarity,” ser. works for improved phoneme classification and recognition,” in Artificial
SysML’08. USENIX Association, 2008, pp. 1–6. [Online]. Available: Neural Networks: Formal Models and Their Applications – ICANN 2005,
http://dl.acm.org/citation.cfm?id=1855895.1855896 W. Duch, J. Kacprzyk, E. Oja, and S. Zadrożny, Eds. Berlin, Heidelberg:
[7] T. Dhaliwal, F. Khomh, and Y. Zou, “Classifying field crash reports for Springer Berlin Heidelberg, 2005, pp. 799–804.
fixing bugs: A case study of mozilla firefox,” ser. ICSM ’11. IEEE [26] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The MIT
Computer Society, 2011, pp. 333–342. Press, 2016.
[8] N. Modani et al., “Automatically identifying known software problems,” [27] J. Lerch and M. Mezini, “Finding duplicates of your yet unwritten bug
ser. ICDEW ’07. IEEE Computer Society, 2007, pp. 433–441. report,” ser. CSMR ’13. IEEE Comp. Soc., 2013, pp. 69–78.
[9] J. Lerch and M. Mezini, “Finding duplicates of your yet unwritten bug [28] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
report,” ser. CSMR ’13. IEEE Comp. Soc., 2013, pp. 69–78. in 3rd International Conference on Learning Representations, ICLR
[10] J. C. Campbell, E. A. Santos, and A. Hindle, “The unreasonable 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
effectiveness of traditional information retrieval in crash report dedu- Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available:
plication,” ser. MSR ’16. ACM, 2016, pp. 269–280. http://arxiv.org/abs/1412.6980
[11] R. Wu et al., “Crashlocator: Locating crashing faults based on crash
stacks,” ser. ISSTA ’14, 2014, pp. 204–214.
[12] J. Deshmukh, K. M. Annervaz, S. Podder, S. Sengupta, and N. Dubash,
“Towards accurate duplicate bug retrieval using deep learning tech-
niques,” ser. ICSME ’17, 2017, pp. 115–124.
[13] S. Mani, A. Sankaran, and R. Aralikatte, “Deeptriage: Exploring the
effectiveness of deep learning for bug triaging,” in Proceedings of
the ACM India Joint International Conference on Data Science and
Management of Data, ser. CoDS-COMAD ’19. New York, NY, USA:
Association for Computing Machinery, 2019, p. 171–179. [Online].
Available: https://doi.org/10.1145/3297001.3297023
[14] I. M. Rodrigues, D. Aloise, E. R. Fernandes, and M. Dagenais, “A soft
alignment model for bug deduplication,” in Proceedings of the 17th
International Conference on Mining Software Repositories, ser. MSR
’20. New York, NY, USA: Association for Computing Machinery, 2020,
p. 43–53. [Online]. Available: https://doi.org/10.1145/3379597.3387470
[15] A. Budhiraja, K. Dutta, R. Reddy, and M. Shrivastava, “Poster: Dwen:
Deep word embedding network for duplicate bug report detection in
software repositories,” in 2018 IEEE/ACM 40th International Confer-
ence on Software Engineering: Companion (ICSE-Companion), 2018,
pp. 193–194.

You might also like