How Do API Selections Affect the Runtime Performance of Data Analytics Tasks?

Yida Tao, Shan Tang, Yepang Liu, Zhiwu Xu, Shengchao Qin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

216 Downloads (Pure)

Abstract

As data volume and complexity grow at an unprecedented rate, the performance of data analytics programs is becoming a major concern for developers. We observed that developers sometimes use alternative data analytics APIs to improve program runtime performance while preserving functional equivalence. However, little is known on the characteristics and performance attributes of alternative data analytics APIs. In this paper, we propose a novel approach to extracting alternative implementations that invoke different data analytics APIs to solve the same tasks. A key appeal of our approach is that it exploits the comparative structures in Stack Overflow discussions to discover programming alternatives. We show that our approach is promising, as 86% of the extracted code pairs were validated as true alternative implementations. In over 20% of these pairs, the faster implementation was reported to achieve a 10x or more speedup over its slower alternative. We hope that our study offers a new perspective of API recommendation and motivates future research on optimizing data analytics programs.
Original languageEnglish
Title of host publicationProceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
PublisherIEEE
Pages665-668
Number of pages4
ISBN (Electronic)9781728125084
ISBN (Print)9781728125084
DOIs
Publication statusPublished - 9 Jan 2020
Event34th IEEE/ACM International Conference on Automated Software Engineering - San Diego, United States
Duration: 11 Nov 201915 Nov 2019
Conference number: 34
https://2019.ase-conferences.org/

Publication series

NameIEEE/ACM International Conference on Automated Software Engineering (ASE)
PublisherIEEE
ISSN (Electronic)2643-1572

Conference

Conference34th IEEE/ACM International Conference on Automated Software Engineering
Abbreviated titleASE 2019
Country/TerritoryUnited States
CitySan Diego
Period11/11/1915/11/19
Internet address

Bibliographical note

Funding Information:
ACKNOWLEDGMENTS This work was partially supported by the National Natural Science Foundation of China under Grant No. 61772347, 61972260, 61932021 and 61802164.

Publisher Copyright:
© 2019 IEEE.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Fingerprint

Dive into the research topics of 'How Do API Selections Affect the Runtime Performance of Data Analytics Tasks?'. Together they form a unique fingerprint.

Cite this