Static analysis is the processes of extracting semantic information about a program at compile time [Lan92, Sec. 1].
Classic example: The live-variables problem, where a variable is live at a statement iff on some execution, is used/accessed after is executed without being redefined [Lan92, Sec. 1; NNH99, p. 49]. This is also an example of data flow analysis (discussed below).
In security, static analysis is supposed to classify the potential behaviour of a program as malicious or benign without executing it [vJ11, p. 754].
Static analysis algorithms historically have come from compiler research and implementations [vJ11, p. 1254], evolving from intraprocedural analysis to interprocedural analysis [Lan92, Sec. 1].
Fundamentally, no tool can determine whether a program terminates due to the uncomputability of the halting problem, as we learn from basic complexity theory.
The halting problem aside, finding an exact solution to typical static analysis questions is almost always undecidable [vJ11, p. 1254].
As codebases grow, static analysis tools take longer to parse and traverse code because they generally operate over all possible branches of execution in a program [Tho21].
Furthermore, static analyses are inherently computationally expensive — often quadratic, sometimes even cubic — in terms of required space or time [Tho21].
Consequently, static analysis tools are under constant pressure to be more efficient.
In general, there exist code obfuscation techniques that can defeat static analysis [MKK07], rendering 1️⃣ static analysis better at finding bugs in benign programs than detecting malware, and 2️⃣ the complementary role of dynamic analysis indispensable.
Besides obfuscation, malware often employ polymorphism such as the Shikata Ga Nai polymorphic encoding scheme [MRC19] to deter static analysis [vJ11, p. 754].
Tools:
Static analysis is especially important for C/C++ code.
For common programming languages, see OWASP’s extensive catalog of static analysis tools.
Regardless of the tool used, it helps to write code that facilitates checking, e.g., by adopting Holzman’s power-of-ten rules [Hol06] for writing safety-critical code.
Dynamic application security testing (DAST) is a specialisation of dynamic (code/program) analysis.
Watch an introduction to dynamic code analysis on LinkedIn Learning:
Dynamic analysis refers to the broad class of techniques that make inferences about a program by observing its runtime execution behaviour [vJ11, p. 365].
An example of dynamic analysis is fuzz testing, which is the execution of a program-under-test (PUT) using input(s) sampled from an input space (the “fuzz input space”) that “protrudes” the expected input space of the PUT, to test if the PUT violates a correctness policy [MHH+21, Definition 1].
Another example of dynamic analysis is taint analysis, also called information flow tracking, which is the tracking of “tainted” data throughout a system while a program manipulating this data is executed [ESKK08].
Better than static analysis, dynamic analysis is robust to malware polymorphism, including low-level obfuscations that can thwart disassembly [vJ11, p. 755].
Applications: software debugging, software profiling and host-based intrusion detection [vJ11, p. 366].
Challenges:
Dynamic analysis tools are typically competent on checking soundness (analysis results are consistent with the actual behaviour of the program), but not so much on completeness (analysis can infer all behaviours of interest of the program).
In general, static analysis often suffers from a high false-positive rate, whereas dynamic analysis is limited in coverage.
There exist anti-emulation techniques that check for certain low-level processor features (e.g., undocumented instructions) or timings, enabling determination of whether the execution environment is an emulation [vJ11, p. 755].
In case of an emulation, the malware can terminate execution without performing any malicious action and risking detection.
Tools:
Among the open-source dynamic analysis tools for C/C++ code, Valgrind [NS07] is likely the best known.
For common programming languages, an extensive catalog of dynamic analysis tools can be found on GitHub.
References
[ACC+17]
A. Arusoaie, S. Ciobâca, V. Craciun, D. Gavrilut, and D. Lucanu, A Comparison of Open-Source Static Analysis Tools for Vulnerability Detection in C/C++ Code, in 2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2017, pp. 161–168. https://doi.org/10.1109/SYNASC.2017.00035.
[ESKK08]
M. Egele, T. Scholte, E. Kirda, and C. Kruegel, A survey on automated dynamic malware-analysis techniques and tools, ACM Comput. Surv.44 no. 2 (2008). https://doi.org/10.1145/2089125.2089126.
[Hol06]
G. J. Holzmann, The power of 10: rules for developing safety-critical code, Computer39 no. 6 (2006), 95–99. https://doi.org/10.1109/MC.2006.212.
V. J. Manès, H. Han, C. Han, S. K. Cha, M. Egele, E. J. Schwartz, and M. Woo, The art, science, and engineering of fuzzing: A survey, IEEE Transactions on Software Engineering47 no. 11 (2021), 2312–2331. https://doi.org/10.1109/TSE.2019.2946563.
A. Moser, C. Kruegel, and E. Kirda, Limits of static analysis for malware detection, in Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), 2007, pp. 421–430. https://doi.org/10.1109/ACSAC.2007.21.
[NS07]
N. Nethercote and J. Seward, Valgrind: A framework for heavyweight dynamic binary instrumentation, SIGPLAN Not.42 no. 6 (2007), 89 – 100. https://doi.org/10.1145/1273442.1250746.
S. Schrittwieser, S. Katzenbeisser, J. Kinder, G. Merzdovnik, and E. Weippl, Protecting software through obfuscation: Can it keep pace with progress in code analysis?, ACM Comput. Surv.49 no. 1 (2016). https://doi.org/10.1145/2886012.
[Tho21]
P. Thomson, Static analysis: An introduction: The fundamental challenge of software engineering is one of complexity, Queue19 no. 4 (2021), 29 – 41. https://doi.org/10.1145/3487019.3487021.