Type Learning for Binaries and its Applications

Zhiwu Xu, Cheng Wen, Shengchao Qin

Research output: Contribution to journalArticle

155 Downloads (Pure)

Abstract

Binary type inference is a challenging problem due partly to the fact that during the compilation much type-related information has been lost. Most existing research work resorts to program analysis techniques, which can be either too heavy-weight to be viable in practice or too conservative to be able to infer types with high accuracy. In this work, we propose a new approach to learning types for binary code. Motivated by “duck typing”, our approach learn types for recovered variables from their features and properties (e.g., related representative instructions). We first use machine learning to train a classifier with basic types as its levels from binaries with debugging information. The classifier is then used to learn types for new, unseen binaries. While for composite types, such as pointer and struct, a points-to analysis is performed. Finally, several experiments are conducted to evaluate our approach. The results demonstrate that our approach is more precise, both in terms of correct types and compatible types, than the commercial tool Hey-Rays, the open source tool Snowman and a recent tool EKLAVYA using machine learning. We also show that the type information our proposed system learns is capable of helping detect malware.
Original languageEnglish
Pages (from-to)893-912
Number of pages20
JournalIEEE Transactions on Reliability
Volume68
Issue number3
DOIs
Publication statusPublished - 25 Dec 2018

Fingerprint Dive into the research topics of 'Type Learning for Binaries and its Applications'. Together they form a unique fingerprint.

  • Cite this