As data volume and complexity grow at an unprecedented rate, the performance of data analytics programs is becoming a major concern for developers. We observed that developers sometimes use alternative data analytics APIs to improve program runtime performance while preserving functional equivalence. However, little is known on the characteristics and performance attributes of alternative data analytics APIs. In this paper, we propose a novel approach to extracting alternative implementations that invoke different data analytics APIs to solve the same tasks. A key appeal of our approach is that it exploits the comparative structures in Stack Overflow discussions to discover programming alternatives. We show that our approach is promising, as 86% of the extracted code pairs were validated as true alternative implementations. In over 20% of these pairs, the faster implementation was reported to achieve a 10x or more speedup over its slower alternative. We hope that our study offers a new perspective of API recommendation and motivates future research on optimizing data analytics programs.
|Title of host publication||Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019|
|Number of pages||4|
|Publication status||Published - 9 Jan 2020|
|Event||34th IEEE/ACM International Conference on Automated Software Engineering - San Diego, United States|
Duration: 11 Nov 2019 → 15 Nov 2019
Conference number: 34
|Name||IEEE/ACM International Conference on Automated Software Engineering (ASE)|
|Conference||34th IEEE/ACM International Conference on Automated Software Engineering|
|Abbreviated title||ASE 2019|
|Period||11/11/19 → 15/11/19|
Bibliographical noteFunding Information:
ACKNOWLEDGMENTS This work was partially supported by the National Natural Science Foundation of China under Grant No. 61772347, 61972260, 61932021 and 61802164.
© 2019 IEEE.
Copyright 2020 Elsevier B.V., All rights reserved.