Understanding Performance Concerns in the API Documentation of Data Science Libraries

Yida Tao, Jiefang Jiang, Yepang Liu, Zhiwu Xu, Shengchao Qin

Research output: Contribution to conferencePaper

Abstract

The development of efficient data science applications is often im- peded by unbearably long execution time and rapid RAM exhaus- tion. Since API documentation is the primary information source for troubleshooting, we investigate how performance concerns are documented in popular data science libraries. Our quantitative re- sults reveal the prevalence of data science APIs that are documented in performance-related context and the infrequent maintenance activities on such documentation. Our qualitative analyses further reveal that crowd documentation like Stack Overflow and GitHub are highly complementary to official documentation in terms of the API coverage, the knowledge distribution, as well as the specific information conveyed through performance-related content. Data science practitioners could benefit from our findings by learning a more targeted search strategy for resolving performance issues. Researchers can be more assured of the advantages of integrating both the official and the crowd documentation to achieve a holistic view on the performance concerns in data science development.
Original languageEnglish
Publication statusPublished - 21 Sep 2020
EventThe 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020) - Melbourne, Australia
Duration: 21 Sep 202025 Sep 2020
https://conf.researchr.org/home/ase-2020

Conference

ConferenceThe 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020)
Abbreviated titleASE2020
Period21/09/2025/09/20
Internet address

Fingerprint Dive into the research topics of 'Understanding Performance Concerns in the API Documentation of Data Science Libraries'. Together they form a unique fingerprint.

  • Cite this

    Tao, Y., Jiang, J., Liu, Y., Xu, Z., & Qin, S. (2020). Understanding Performance Concerns in the API Documentation of Data Science Libraries. Paper presented at The 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020), .