usenixsecurity24-du
usenixsecurity24-du
Wenlong Du and Jian Li, School of Electronic Information and Electrical Engineering,
Shanghai Jiao Tong University; Yanhao Wang, Independent Researcher; Libo Chen,
Ruijie Zhao, and Junmin Zhu, School of Electronic Information and Electrical Engineering,
Shanghai Jiao Tong University; Zhengguang Han, QI-ANXIN Technology Group;
Yijun Wang and Zhi Xue, School of Electronic Information and Electrical Engineering,
Shanghai Jiao Tong University
https://www.usenix.org/conference/usenixsecurity24/presentation/du
Wenlong Du*1 , Jian Li*1 , Yanhao Wang2 , Libo Chen B1 , Ruijie Zhao1 ,
Junmin Zhu1 , Zhengguang Han3 , Yijun Wang1 , and Zhi Xue1
1 School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
2 Independent Researcher 3 QI-ANXIN Technology Group
(IoT) [7, 8] devices. Among different architectural styles for tion for describing RESTful API interfaces. A Swagger-based specification
outlines the usage guidelines for RESTful APIs, including the types of service
* Co-leading authors. B Corresponding author. requests accepted, the expected response format, and other relevant details.
Furthermore, these testing methods often lack a targeted by a specific type of bug belonged to the same functional-
strategy and approach based on specific security vulnerabil- ity. These findings confirm the validity of our intuition for
ities. Instead, they primarily rely on detecting generic bugs these common vulnerabilities. Additionally, we identified
through feedback such as “500 Internal Server Errors” from a group of commonly used strings in specific functionality
API endpoints. Consequently, this leads to an indiscriminate as keywords. For example, in an upload interface, we may
and inefficient testing approach, with the majority of detected observe frequently used strings such as “upload”, “submit”,
bugs being functional issues rather than genuine security vul- and “import” which can be used as keywords to identify cor-
nerabilities. responding interfaces from API specifications and support
Observation. According to the motivation sample, we ob- subsequent bug detection, as shown in Table 1. It is also im-
serve that we can assess the functionality of APIs by exam- portant to note that these six types of vulnerabilities are the
ining certain feature strings commonly used to define API mainstream of web vulnerabilities, collectively accounting
features in the specification, such as the presence of remote for over 70% of the total [38].
in the current example. Meanwhile, the keyword remote is Our Method. Building upon our observation and verifica-
also strongly associated with the type of vulnerability (SSRF). tion result, we propose the core concept of our method, which
To validate our observation that the category of vulnerability involves identifying API functions associated with vulnera-
present in an API interface is strongly linked to the func- bilities within API specifications and conducting targeted
tionality of that interface via some keywords, we gathered security testing on those functions. This approach aims to
data from the CVE database on the most common types of achieve vulnerability-oriented API inspection, thereby facili-
revealed API vulnerabilities (e.g., Unrestricted Upload and tating the effective discovery of bugs across a wide range of
SSRF) and selected vulnerabilities that had been disclosed in RESTful APIs. Different from the previous method, it con-
CVE records with specific details, such as the exact location siders the API functionality when identifying and testing for
of the bug within the API interface. vulnerabilities.
Based on our collected data2 , we first conduct a word fre-
quency analysis on the API paths and parameters of API
interfaces according to these vulnerabilities’ CVE descrip- 2.3 Challenges and Our Approach
tions. Then, combining with expert knowledge (analyze the There are three challenges in realizing vulnerability-oriented
root cause of these vulnerabilities), we extract high-frequency API inspection in practice.
words that also reflect the functionality of these vulnerable C1: How to efficiently differentiate between various func-
APIs, such as "upload" and "submit," and mapped these for tional interfaces in an API and identify functions that may
the functionalities (e.g., File Upload function) and vulner- have security vulnerabilities? API documentations usually
abilities (e.g., Unrestricted Upload of File). Finally, we try contain a large number of APIs, but only a small portion of
to construct a mapping from the vulnerability type to the them may have security vulnerabilities. Hence, it is crucial to
corresponding function functionality and its feature strings employ a method that can identify candidate APIs, enabling
frequently applied in context. us to enhance testing efficiency and minimize ineffective test
Our analysis results are presented in Table 1, which shows requests.
that, on average, 57% of vulnerable API interfaces affected C2: How to generate test case sequences that comply with pro-
2 All the vulnerability context information and keywords gathered during tocol states and trigger the vulnerable functional interfaces?
this construction process can be found at the link: https://github.com/N API invocations are state-based and often require a series of
SSL-SJTU/VoAPI2/CVE-Information.xlsx. pre-request actions (request sequences) to fulfill the execution
③ Verification
④ Feedback
Build API
1. /avatars/favicon
tagged_param: url Application
Specification Function Vulnerability 2. /avatars/image
Bug?
tagged_param: url
Parser Mapper …
Validation
Candidate APIs Server
Step 1: Specification Analysis Step 2: Candidate Interface Extraction Step 3: Test Sequence Generation Step 4: Feedback-based Test
Figure 3: System Architecture of VOAPI2 . VOAPI2 first analyzes the API specification and identifies semantic keywords
associated with potentially weak functions within the API specification. It then integrates a suitable corpus and employs a stateful
request sequence that aligns with the execution context of the corresponding functions. By utilizing inspection payloads tailored
to different vulnerability types, VOAPI2 can effectively assess these candidate functions for potential vulnerabilities.
7 7 7 7 7 7 7
6 6 6 6 6 6 6
5 5 5 5 5 5 5
4 4 4 4 4 4 4
3 3 3 3 3 3 3
2 2 2 2 2 2 2
1 1 1 1 1 1 1
0 0 0 0 0 0 0
RE 2
Re er
ZA
As
RE 2
Re er
ZA
As
RE 2
Re er
ZA
As
RE 2
Re er
ZA
As
RE 2
Re er
ZA
As
RE 2
Re er
ZA
As
RE 2
Re er
ZA
As
Vo
Vo
Vo
Vo
Vo
Vo
Vo
IN
IN
IN
IN
IN
IN
IN
tra
tra
tra
tra
tra
tra
tra
stT
stT
stT
stT
stT
stT
stT
AP
AP
AP
AP
AP
AP
AP
ST
ST
ST
ST
ST
ST
ST
P
ER n
ER n
ER n
ER n
ER n
ER n
ER n
es
es
es
es
es
es
es
I
I
l
l
tG
tG
tG
tG
tG
tG
tG
e
e
Figure 4: The vulnerabilities and their types uncovered by different tools on evaluation benchmarks .
Table 5: Compared with three RESTful API testing tools. #500 means the number of HTTP 500 errors found by tools. #Packet
means the number of all packets sent in testing. #Ratio is equal to total #500 divided by total #Packet.
Appwrite Casdoor Gitea Jellyfin Microcks Rbaskets GitLab
Compare #Ratio
#500 #Packet Time #500 #Packet Time #500 #Packet Time #500 #Packet Time #500 #Packet Time #500 #Packet Time #500 #Packet Time
RESTler 5 88,558 5h 0 165,470 5h 1 297,419 5h 54 2,175 12m07s 19 67,128 5h 0 20,435 5h 32 105,931 5h 0.015%
MINER 4 64,944 5h 0 104,352 5h 1 31,668 5h 68 23,002 5h 13 143,830 5h 0 17,422 5h 13 48,267 5h 0.023%
RestTestGen 1 9,030 19m49s 0 16,340 34m13s 2 58,200 4h37m 63 71,440 4h43m 15 7,340 8m5s 0 2,140 1m32s 41 57,550 1h52m 0.055%
RestTestGen+V 1 6,334 12m42s 0 8,920 13m09s 0 32,550 1h28m 55 48,740 3h57m 11 4,246 5m51s 0 1,550 1m03s 36 43,110 1h33m 0.071%
VOAPI2 1 2,123 1m51s 0 6,987 6m27s 0 9,137 7m05s 23 13,578 10m53s 2 509 24s 0 238 13s 3 1,558 5m03s 0.085%
14 bugs that elude detection by other tools. Based on our anal- positive rate.
ysis, this phenomenon can be attributed to twofold factors. For path traversal vulnerabilities, ZAP lacks further vali-
First, in the case of RESTful API testing tools (e.g., RESTler), dation. It solely bases its determination of a vulnerability’s
they often lack a comprehensive testing corpus required for existence on whether the test request returns a 2xx status code.
verifying various types of vulnerabilities (e.g., SSRF) within On the other hand, VOAPI2 goes a step further by analyzing
their corresponding endpoints. Thus, these tools often fail the response content, checking for the presence of content
to detect a vulnerability if it does not lead to a HTTP 500 er- corresponding to the test payload to confirm the existence
ror. Second, scanning tools (e.g., ZAP) face challenges in of the vulnerability. For instance, when the test payload is
constructing an appropriate sequence encompassing multi- "/etc/passwd", VOAPI2 will match the response content for
ple requests that align with the data dependencies existing characteristic strings (e.g., root) to validate the vulnerability.
among API endpoints. Thus, these tools fail to identify the re- In terms of VOAPI2 , false positives may occur when check-
spective vulnerabilities (e.g., XSS) concealed behind intricate ing a particular XSS (i.e., stored XSS) and the implicit unre-
interactions, as exemplified in §5.4. stricted upload vulnerability. In these scenarios, we need to
trigger vulnerability manually and check whether vulnerabil-
ity exists or not. Moreover, we thoroughly discuss the root
5.3 Accuracy (RQ1) causes and the corresponding improvement ways in §6.
We further analyzed the accuracy of VOAPI2 and vulnerabil-
ity scanners. All alerts were manually verified to determine if 5.4 Real-world Vulnerabilities (RQ1)
they were true vulnerabilities, thereby identifying false pos-
itives (FP). The false discovery rate (FDR = FP/(FP+TP)) We applied VOAPI2 to discover security vulnerabilities in
was calculated and presented in Table 7. It can be observed real API applications and found various vulnerabilities. As
that VOAPI2 has lower false positive rates compared to the shown in Table 6, we identified more types of vulnerabilities
scanners, which can be attributed to VOAPI2 ’s vulnerability- compared to scanning tools, especially on XSS and SSRF.
oriented strategy and more accurate vulnerability validation The main reason is that our method can generate effective
strategy. request sequences that access multiple endpoints in proper
For XSS vulnerabilities, ZAP and Astra conduct tests on order, allowing us to trigger deeper vulnerabilities.
all APIs and determine the existence of an XSS vulnera- Case Study: XSS. The XSS vulnerability (CVE-2022-
bility solely based on the presence of the XSS payload in 2925) was discovered in the Appwrite application. As
the response. However, a significant number of these APIs shown in Figure 5, this bug exists in five API end-
do not contain display functions. Consequently, the payload points (/teams, /users, /functions, /database/collections,
would not be displayed on any particular page, which re- /teams/{teamId}/memberships), and the vulnerable argu-
sults in a high incidence of false positives. Correspondingly, ments in these endpoints are "name" marked in blue color.
the vulnerability-oriented testing strategy helps VOAPI2 find In our experiments, both ZAP and Astra face significant
more API paths related to XSS, which leads to a lower false difficulties. While they may encounter problems in gen-
61.1 62.7
59.1 58.8
60
52.9
50.0
45.3 46.6
44.0
40 37.4 37.3
32.1 30.1
27.4 27.1 28.9 28.6
24.9
21.1
20
11.7 9.8 11.5 11.5
0
Appwrite Casdoor Gitea Jellyfin Microcks Rbaskets GitLab Average
[3] Jellyfin. https://jellyfin.org/. [17] Tobias Fertig and Peter Braun. Model-driven testing of
RESTful APIs. In International Conference on World
[4] Casdoor. https://casdoor.org/. Wide Web: Companion, pages 1497–1502, 2015.
[5] Appwrite. https://appwrite.io/. [18] Andrea Arcuri. RESTful API automated test case gener-
ation with EvoMaster. ACM Transactions on Software
[6] Microcks. https://microcks.io/. Engineering and Methodology (TOSEM), 28(1):1–37,
2019.
[7] CVE-2021-3044. https://nvd.nist.gov/vuln/de
tail/CVE-2021-3044. [19] Andrea Arcuri. Automated blackbox and whitebox test-
ing of RESTful APIs with EvoMaster. IEEE Software,
[8] CVE-2019-12643. https://nvd.nist.gov/vuln/de 2020.
tail/CVE-2019-12643.
[20] Andrea Arcuri and Juan P Galeotti. Handling SQL
[9] Representational state transfer. https://en.wikiped databases in automated system test generation. ACM
ia.org/wiki/Representational_state_transfe Transactions on Software Engineering and Methodology
r. (TOSEM), 29(4):1–31, 2020.
[10] Tomer Bar. Notifying our Developer Ecosystem about a [21] Andrea Arcuri and Juan P Galeotti. Enhancing search-
Photo API Bug. https://developers.facebook.co based testing with testability transformations for existing
m/blog/post/2018/12/14/notifying-our-devel APIs. ACM Transactions on Software Engineering and
oper-ecosystem-about-a-photo-api-bug/, 2018. Methodology (TOSEM), 31(1):1–34, 2021.
[11] Twitter. An Incident Impacting Your Account Identity. [22] Vaggelis Atlidakis, Roxana Geambasu, Patrice Gode-
https://privacy.twitter.com/en/blog/2020/a froid, Marina Polishchuk, and Baishakhi Ray. Pythia:
n-incident-impacting-your-account-identity, Grammar-Based Fuzzing of REST APIs with Coverage-
2020. guided Feedback and Learning-based Mutations. arXiv
preprint arXiv:2005.11498 [cs.SE], 2020.
[12] Vaggelis Atlidakis, Patrice Godefroid, and Marina Pol-
[23] S. T. Liu. Coverage guided fuzzing in python-based
ishchuk. RESTler: Stateful REST API Fuzzing. In
web server. Master’s thesis, Institute of Computer Sci-
2019 IEEE/ACM 41st International Conference on Soft-
ence and Engineering, National Chiao Tung University,
ware Engineering (ICSE), pages 748–758, Montreal,
Hsinchu, Taiwan, 2019.
QC, Canada, 2019. IEEE.
[24] Runtime application self-protection. https://en.wik
[13] Hamza Ed-Douibi, Javier Luis Cánovas Izquierdo, and ipedia.org/wiki/Runtime_application_self-p
Jordi Cabot. Automatic generation of test cases for rotection.
REST APIs: a specification-based approach. In Interna-
tional Enterprise Distributed Object Computing Confer- [25] Huayao Wu, Lixin Xu, Xintao Niu, and Changhai Nie.
ence, pages 181–190, 2018. Combinatorial testing of RESTful APIs. In Proceedings