Bogdan Dit – How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms

Over the past few years, many techniques have been proposed to gather topics from documents, among which Latent Dirichlet Allocation.

These authors propose to use LDA for traceability link recovery, feature location, and software artifact labeling. However, the standard version of LDA does not work so well on source code. Luckily, there is good news, as source code is different (better) structures than natural language, and we can exploit this.

Therefor the authors present LDA-GA:  a tool that uses a Genetic Algorithm (GA) to determine the near-optimal configuration for LDA. By calibrating  LDA on a specific dataset, its success can be increased significantly.Capture

Preprint