Hengkai Ye

Ph.D. Candidate
College of Information Sciences and Technology
The Pennsylvania State University

E-mail: hengkai at psu dot edu

Short Bio: I am a second-year Ph.D. candidate at Penn State University, advised by Prof. Hong Hu. Before joining Penn State University, I obtained my Bachelor's degree from Huazhong University of Science and Technology and Master's degree from Purdue University. My research interests include Software and System Security.


Conference Proceedings

  1. VIPER: Spotting Syscall-Guard Variables for Data-Only Attacks Website Slides Paper
    Hengkai Ye, Song Liu, Zhechang Zhang and Hong Hu
    In Proceedings of the 32nd USENIX Security Symposium (Security 2023)
  2. As control-flow protection techniques are widely deployed, it is difficult for attackers to modify control data, like function pointers, to hijack program control flow. Instead, data-only attacks corrupt security-critical non-control data (critical data), and can bypass all control-flow protections to revive severe attacks. Previous works have explored various methods to help construct or prevent data-only attacks. However, no solution can automatically detect program-specific critical data. In this paper, we identify an important category of critical data, syscall-guard variables, and propose a set of solutions to automatically detect such variables in a scalable manner. Syscall-guard variables determine to invoke security-related system calls (syscalls), and altering them will allow attackers to request extra privileges from the operating system. We propose branch force, which intentionally flips every conditional branch during the execution and checks whether new security-related syscalls are invoked. If so, we conduct dataflow analysis to estimate the feasibility to flip such branches through common memory errors. We build a tool, VIPER, to implement our ideas. VIPER successfully detects 34 previously unknown syscall-guard variables from 13 programs. We build four new data-only attacks on sqlite and v8, which execute arbitrary command or delete arbitrary file. VIPER completes its analysis within five minutes for most programs, showing its practicality for spotting syscall-guard variables

  3. BET: Black-box Efficient Testing for Convolutional Neural Networks Paper
    Jialai Wang, Han Qiu, Yi Rong, Hengkai Ye, Qi Li, Zongpeng Li and Chao Zhang
    In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022)
  4. Testing Convolutional neural networks (CNNs) to find defects (e.g. error-inducing inputs) before deploying them in security-sensitive scenarios is crucial. Although existing white-box testing methods can effectively test CNN models with high coverage achieved, requiring full knowledge of target CNN models which may not always be available in privacy-sensitive scenarios. In this paper, we propose a novel Black-box Efficient Testing (BET) method for CNN models. The core insight of BET is that CNNs are generally prone to be affected by continuous perturbations. Thus, by generating such continuous perturbations in a black-box manner, we design a tunable objective function to guide our testing process for thoroughly exploring defects in different decision boundaries of target CNN models. We also design an efficiency-centric policy to find more error-inducing inputs with a fixed query budget. We conduct extensive evaluations with three well-known datasets and five popular CNN structures. The results show that BET significantly outperforms existing white-box or black-box testing methods considering the effective error-inducing inputs found in a fixed query/inference budget. We further show that the error-inducing inputs found by BET can be used to fine-tune the target model to improve the accuracy by up to 3%.

  5. Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching Paper
    Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin and Hengkai Ye
    In ACM Transactions on Software Engineering and Methodology (TOSEM), 2021
  6. Detecting software vulnerabilities is an important problem and a recent development in tackling the problem is the use of deep learning models to detect software vulnerabilities. While effective, it is hard to explain why a deep learning model predicts a piece of code as vulnerable or not because of the black-box nature of deep learning models. Indeed, the interpretability of deep learning models is a daunting open problem. In this article, we make a significant step toward tackling the interpretability of deep learning model in vulnerability detection. Specifically, we introduce a high-fidelity explanation framework, which aims to identify a small number of tokens that make significant contributions to a detector's prediction with respect to an example. Systematic experiments show that the framework indeed has a higher fidelity than existing methods, especially when features are not independent of each other (which often occurs in the real world). In particular, the framework can produce some vulnerability rules that can be understood by domain experts for accepting a detector's outputs (i.e., true positives) or rejecting a detector's outputs (i.e., false-positives and false-negatives). We also discuss limitations of the present study, which indicate interesting open problems for future research.


  1. One Flip is All It Takes: Identifying Syscall-Guard Variables for Data-Only Attacks
    (Industry Conference) Paper
    Hengkai Ye, Hong Hu, Song Liu and Zhechang Zhang
    Black Hat Asia Briefings (Black Hat Asia 2024)


Honors and Awards