Soot,1 a tool for statically analyzing Java code, has recently been supplanted by SootUp.2 SootUp is marketed as the next generation of Soot and developed by the same authors as Soot. But how exactly is SootUp different from Soot? Do the major capabilities of Soot still exist in SootUp? How easy is SootUp to understand and use compared to Soot? And lastly, is SootUp ready for real world use? In this post we look to answer these questions by exploring some of the most used features of Soot in SootUp and their documentation. These features include the ability to: 1) load and process Java bytecode for analysis, 2) create/modify existing Java code at runtime, and 3) create a call graph.
Background And Overarching Design Improvements
Soot1 was originally authored around 2000 for performing bytecode analysis on Java 1.2. In the past 24 years, there have been many additions and improvements to Soot that have been a result of the changes in Soot’s use cases and the changes to Java itself. Unfortunately, Soot has not aged well. It has a bloated code base containing many features that are no longer used, and suffers from a long history of older coding practices. In the last few years, the developers of Soot introduced SootUp2 to solve these issues. SootUp is a complete overhaul and rewrite of Soot. Whereas Soot was a command line tool that was eventually transformed into a library, SootUp is designed to be a Java static analysis library from the start.
At a high level, the most significant improvement of SootUp is that it is designed and intended to be used as a library. This design decision influences many of the other changes in SootUp and has largely helped to eliminate much of what made Soot difficult to use. Other improvements include the elimination of singleton classes, the inclusion of builders and other components of immutable classes, standard and more understandable method naming schemes, and a much easier to understand overall structure. SootUp’s own documentation3 does a decent job of highlighting and discussing these improvements. In addition to these improvements, SootUp’s use of high quality third-party libraries (e.g. JGraphT, WALA, and ASM) to handle some of the backend heavy lifting are a significant improvement over Soot’s own dated custom implementations. These libraries allow the maintainers to focus on the new and improved static analysis features while not having to maintain outdated code with similar features. Lastly, there is a noticeable performance improvement in SootUp compared to Soot, which is likely a result of its improved design and slimmer code base.
Loading Code For Analysis
A typical Soot or SootUp workflow always starts with loading code for analysis from some source. Both Soot and SootUp are able to handle a number of different forms of input for the code being analyzed. These include file extensions such as .class
, .jar
, .zip
, .apk
, .dex
, and .jimple
. Soot and SootUp can also load code from un-compiled .java
files. However, loading from multiple interdependent .java
files is still experimental in SootUp.4 Additionally, SootUp added support for loading code from .war
files which Soot does not support out of the box.
Overall, SootUp’s design is similar to that of Soot except it no longer relies on monolithic singleton classes. These classes have instead been replaced with reusable classes like View
. In SootUp, a View
5 is a class that contains all the classes loaded from a list of AnalysisInputLocation
.6 AnalysisInputLocation
is a class that specifies how to find and load the Java classes SootUp is to use for its analysis. Different child types of AnalysisInputLocation
are used to load the various file types mentioned above. Effectively, a View
is the core data structure that stores the code SootUp is analyzing, much like the singleton Scene
class of Soot.
After the supplied code sources have been parsed, both Soot and SootUp apply transformations to the code. These transformations standardize the code into a layout that is functionally equivalent but easier to process for static analysis purposes. These transformers are known as BodyTransformer
(abstract class) in Soot and BodyInterceptor
(interface) in SootUp. The default list of the BodyInterceptor
classes run by SootUp are available through BytecodeBodyInterceptors.Default.getBodyInterceptors()
. It should be noted that prior to SootUp 1.3.0, no BodyInterceptor
classes were applied by default to loaded bytecode. However, in version 1.3.0 this behavior was changed so that a default list is applied. This is likely because transforming loaded code is a required step in both the Soot and SootUp loading process. Without such transformations, errors can and likely will occur further down the line as assumptions made about the code layout will be wrong.
The example code below illustrates some of the available BodyInterceptor
classes in SootUp. The comments in the code indicate differences in Soot and SootUp. Additionally, some of the BodyInterceptor
classes are commented out because they are either not currently available, not fully implemented, or have bugs. With these restrictions in mind, the following list of SootUp BodyInterceptor
classes is the closest to the default list of Soot BodyTransformer
classes that SootUp can currently apply to code without error.
Creating a JavaView in SootUp
public static MutableJavaView makeJavaView(String classPath) {
// This list was constructed and organized by combining the original Soot with SootUp.
List<BodyInterceptor> bodyInterceptors = Collections.unmodifiableList(Arrays.asList(
// new TrapTightener(), missing: impl not finished
// new DuplicateCatchAllTrapRemover(), missing: does not exist in SootUp
new UnreachableCodeEliminator(),
...
The list displayed in the code above is similar to the list of default BodyInterceptor
classes applied by SootUp except for the inclusion of the UnusedLocalEliminator
, LocalPacker
, LocalNameStandardizer
, and UnreachableCodeEliminator
. The first three of these help to reduce the number of variables and standardize variable names. This is largely to improve readability when looking at the processed code and reduce complexity when analyzing it. The last one removes unreachable code blocks from the bodies of methods. This BodyInterceptor
is critical as prior BodyInterceptor
classes may introduce unreachable code blocks into a method during their processing. Additionally, the BodyInterceptor
classes TypeAssigner
and LocalSplitter
from the default list are excluded as they were found to be causing exceptions because of unidentified bugs. Neither were critical for the purposes of simply building a call graph. However, TypeAssigner
may be necessary in more complex static analysis use cases. More information about the BodyInterceptor
classes mentioned and how they transform loaded code can be found by examining SootUp’s source code.2
On the whole, the transformations of SootUp are similar to those in Soot. Indeed, many even share the same name. However, some are still missing, have not been implemented fully, or are still buggy. This will likely be improved as development of SootUp continues. However, bugs in the transformation stage of SootUp can be significant hindrances as it is often not easy to identify the root cause or find a work around.
Generating Code Dynamically
Both Soot and SootUp not only provide the ability to analyze existing code, but also allow for the modification and creation of new code. SootUp has simplified this process significantly through the use of modern coding patterns (e.g. builders and Java streams) in its APIs.89 This is a significant improvement to the design patterns of Soot and greatly increases the usability of SootUp.
Unfortunately, one feature omitted from SootUp’s API is the ability to generate classes at runtime.
While it is possible to add and remove classes from an existing View
,5 the classes themselves cannot be created without an underlining input location. SootUp supports the addition and removal of classes to an existing View
through the use of implementations of the MutableView
10 interface (e.g. MutableJavaView
11). However, in order to create the object that holds a single class (i.e. a SootClass
12) a SootClassSource
13 must be provided. The creation of these SootClassSource
objects requires that an AnalysisInputLocation
6 be provided. For classes generated during runtime, no such input location can be provided as none exists. Additionally, there is no subtype of AnalysisInputLocation
that can be used with runtime generated classes within the SootUp API.
Fortunately, it is possible to create a dummy AnalysisInputLocation
for classes that only exist in memory. We created the InMemoryJavaAnalysisInputLocation
14 class outlined below for this purpose. It relies on the fact that AnalysisInputLocation
objects do not get used after a View
has been created and all method bodies have been loaded. An example of how to use this call to create a runtime generated class can be found in the ServiceNow SecurityResearch GitHub repo.7
Dummy Input Location For Use With Runtime Created Classes
package sootup.core.inputlocation;
public class InMemoryJavaAnalysisInputLocation implements AnalysisInputLocation {
@Nonnull final Path path = Paths.get("only-in-memory.class");
@Nonnull final List<BodyInterceptor> bodyInterceptors;
@Nonnull private final OverridingJavaClassSource classSource;
@Nonnull final SourceType sourceType;
...
Overall, the inclusion of MutableView
classes in the current API design suggests that the omission of a AnalysisInputLocation
implementation providing an in-memory input location capability is likely an oversight or more simply yet to be implemented. After all, the ability to create classes at runtime is a core component of Soot and SootUp. We imagine it will likely be corrected in later iterations of the API.
Call Graph
One of the core features of Soot and SootUp is the ability to create a call graph of Java code that portrays the flows between methods. The discussion below outlines the differences in how Soot and SootUp generate call graphs. It focuses on the central components that comprise the call graph generation process. Namely, how a call graph is: 1) represented and stored, 2) generated using provided algorithms, 3) modified, and 4) fine tuned to fit analysis needs.
Graphing Backend and Data Structure
While Soot uses an internal data structure to house its call graph, SootUp makes use of the third-party library JGraphT.15 JGraphT is a Java library focused on graph theory, data structures, and algorithms. Its direct use in SootUp makes it much easier to define and manipulate graphs using well known graph theory techniques. These graphs can also be easily exported in formats usable by other libraries, or transformed and rendered for visualization. However, for Soot, implementing much of the functionality provided by JGraphT would require significant modification as its graph structure only supports the most basic of traversal techniques.
Unfortunately, while SootUp is using JGraphT as a backend, the JGraphT API is not easily accessed through SootUp’s API by default. This is because SootUp wraps the JGraphT graphs in its own classes, primarily the sootup.callgraph.GraphBasedCallGraph
class, in order to provide additional functionality. The underlining JGraphT graph stored in GraphBasedCallGraph
can only be accessed from child classes using undocumented protected methods. The code snippet below illustrates how to wrap an existing GraphBasedCallGraph
object in order to access the underlining JGraphT graph through the method getGraph()
. A complete version of this class can be found in the ServiceNow SecurityResearch GitHub repo.16
Wrapping GraphBasedCallGraph
To Access the Underlining JGraphT Graph
package sootup.callgraph;
...
public class CallGraphWrapper extends GraphBasedCallGraph {
private final GraphBasedCallGraph cg;
public CallGraphWrapper(GraphBasedCallGraph cg) {
this.cg = cg;
}
@Override
public void addMethod(@Nonnull MethodSignature calledMethod) {
cg.addMethod(calledMethod);
}
...
public void removeCall(@Nonnull MethodSignature sourceMethod, @Nonnull MethodSignature targetMethod) {
getGraph().removeEdge(vertexOf(sourceMethod), vertexOf(targetMethod));
}
public void removeMethod(@Nonnull MethodSignature method) {
if(calls(method).isEmpty()) {
removeMethodForce(method);
}
}
// will remove all edges associated with vertex as well
public void removeMethodForce(@Nonnull MethodSignature method) {
getGraph().removeVertex(vertexOf(method));
getSignatureToVertex().remove(method);
}
}
In addition to the changes mentioned above, it should also be noted that SootUp call graphs do not differentiate between two calls to the same method from within the same originating method. That is if a method calls the same method twice, only one edge is created in the graphs SootUp produces. As such, information about the call sites in a method are also lost in SootUp’s call graphs. On the other hand, Soot creates a new edge for every call statement in a method. This is functionality that can and likely will be added in future updates to SootUp as it is necessary for more precise call graph building.
Graph Generation Algorithms
SootUp currently provides two call graph construction algorithms: Class Hierarchy Analysis (CHA) and Rapid Type Analysis (RTA).17 CHA constructs a call graph using Java’s type hierarchy when resolving a method call. It considers all parent/child and interface classes available in the View
when determining the method call resolution target. This is the most sound call graph construction algorithm in SootUp, but it can be imprecise. RTA improves the precision of CHA by considering only instantiated implementers/extenders of a class/interface when resolving a method call. However, RTA assumes the entry point for the call graph is a program’s main method. Indeed, the soundness of RTA’s call graph can be affected if classes are not instantiated somewhere within the call flows of the provided entry point. As such, when analyzing a entry point of a service with multiple entry points (e.g. APIs), or those without them (e.g. libraries), where some instantiation occurs outside a single entry point’s flow, additional code is required when using RTA to ensure its call graph remains sound.
Soot provides implementations of both CHA and RTA as well as a significantly more precise call graph algorithm using points-to analysis known as SPARK.18 Points-to analysis improves on RTA by only considering the class/interface types a variable can point to according to how the variable is defined and set. It is worth noting that SPARK is both flow and context insensitive; it does not differentiate between control flows and call state. SPARK has much of the same additional code requirements for services with multiple or no entry points as RTA. SootUp does not currently contain a equivalent points-to call graph generation algorithm. However, the most recent developer builds19 have introduced support for building call graphs using Qilin.20 Qilin is a Java points-to analysis framework for supporting context-sensitivity. This suggests SootUp will soon fully support a more precise points-to call graph generation algorithm than SPARK.
Removing Edges From Graphs
As alluded to in the above CallGraphWrapper
code snippet, one of the limitations of SootUp’s current call graph data structure is the inability to remove edges and nodes without additional implementation. The GraphBasedCallGraph
class wraps the underlining JGraphT graph in order to provide additional functionality. GraphBasedCallGraph
also implements the interface MutableCallGraph
, which suggests that SootUp’s graphs should be modifiable. Unfortunately, the interface currently only requires the implementation of methods to add nodes and edges, not remove them. Luckily, as illustrated in the CallGraphWrapper
class, adding the ability to remove edges and nodes currently only requires a little additional work. Additionally, it should be fairly simple to add this functionality to SootUp out of the box in future updates.
Filtering Call Graphs
A common requirement when constructing call graphs is the ability to filter out certain classes and packages from the call graph that are known not to be important for analysis purposes. Unfortunately, Soot and SootUp are both limited in the call graph filtering capabilities they provide out of the box. However, SootUp provides even fewer capabilities than Soot. Soot provides limited filtering capabilities by supporting the inclusion and exclusion of packages.21 Additionally, entire JARs, folders, and class files can be set as library classes22 when loaded to prevent call graph edges from being generated for the contained methods. On the other hand, SootUp only provides the ability to set JARs, folders, and class files as library classes when loaded,2324 and does not otherwise provide the ability to filter classes further.
Fortunately, in SootUp, it is fairly easy to implement a custom call graph filter. The ServiceNow SecurityResearch GitHub repo provides an implementation of an example CallGraphFilter
25 that can be applied to SootUp call graphs using the CallGraphWrapper
class referenced above. When the CallGraphFilter
is applied to a call graph, it removes the outgoing edges of methods that match entries in the filter. The CallGraphFilter
currently provides the ability to match all methods in packages and classes by class name based on a provided regular expression pattern. Additionally, it is able to match methods in classes that implement interfaces or extend other classes based on the type hierarchy information SootUp generates. It can also be easily extended to provide additional entry types. CallGraphFilter
effectively functions as a allow/deny list where the first match in the list, if any, determines the fate of the outgoing edges of a method for some call graph. If no match occurs, it applies the specified default policy.
CallGraphFilter
does not itself provide the capability to read its structure directly from a file. However, the ServiceNow SecurityResearch GitHub repo provides a sample implementation for using yaml files.26 More information on this can be found in the readme file.27 Below is an example of how a CallGraphFilter
might be specified via a yaml file.
Example Yaml Config For The CallGraphFilter
filter_default_policy: deny
filter:
- type: class_path
pattern: 'com\.snc\.secres\.sample.*'
policy: allow
API Changes and Documentation
While SootUp’s documentation on how to use it and its API is significantly better that what Soot provides, it could still be improved. SootUp provides an easily consumable website28 with design overview, explanations, basic examples, and the Javadoc for its API. This is a significant improvement over Soot, which only provides an outdated wiki, information on CLI options, and almost no documentation on its API. Soot often requires a developer to look directly at its source code to determine available methods and features. However, the information provided on the SootUp website is in no way exhaustive. Much of SootUp’s API is still undocumented or has limited documentation. As such, a exploration of the source code is still required to understand many of the methods available. This is likely because SootUp’s API is still in flux. Indeed, moving from SootUp version 1.2.0 to 1.3.0 required some minor code changes as a result of changes to the API. Hopefully, future releases of SootUp will provide more in-depth documentation or at the very least a more complete Javadoc.
Conclusion
From this comparison, it is clear SootUp is still a work in progress. It contains many of the core capabilities of Soot, but some can be buggy or provide a subset of the features that Soot offers. Additionally, while not debilitating, the API changes between versions were significant enough to require some code adjustmentss. However, the improvements in performance, overall design, and usability all make the switch to using SootUp over Soot worth the effort. So long as it is possible to solve any errors encountered and work around missing features or limitations in the API, SootUp is a better option for new Java static analysis efforts going forward.
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.core/src/main/java/sootup/core/views/View.java ↩ ↩2
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.core/src/main/java/sootup/core/inputlocation/AnalysisInputLocation.java ↩ ↩2
-
https://github.com/ServiceNow/SecurityResearch/tree/main/rhino-tracker/tool/src/main/java/com/snc/secres/tool/passive/SootTools.java ↩ ↩2
-
https://github.com/soot-oss/SootUp-Examples/blob/af4e7eccfea5604696ec304a87d66337d82a480e/MutatingSootClassExample/src/main/java/sootup/examples/MutatingSootClass.java ↩
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.core/src/main/java/sootup/core/views/MutableView.java ↩
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.java.core/src/main/java/sootup/java/core/views/MutableJavaView.java ↩
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.core/src/main/java/sootup/core/model/SootClass.java ↩
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.core/src/main/java/sootup/core/frontend/SootClassSource.java ↩
-
https://github.com/ServiceNow/SecurityResearch/blob/main/rhino-tracker/tool/src/main/java/sootup/core/inputlocation/InMemoryJavaAnalysisInputLocation.java ↩
-
https://github.com/ServiceNow/SecurityResearch/tree/main/rhino-tracker ↩ ↩2
-
https://www.sable.mcgill.ca/soot/tutorial/phase/phase.html#SECTION00060000000000000000 ↩
-
https://www.sable.mcgill.ca/soot/tutorial/usage/#SECTION00045000000000000000 ↩
-
https://www.sable.mcgill.ca/soot/tutorial/usage/#SECTION00030000000000000000 ↩
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.core/src/main/java/sootup/core/model/SourceType.java ↩
-
https://github.com/soot-oss/SootUp/blob/af4e7eccfea5604696ec304a87d66337d82a480e/sootup.java.bytecode/src/main/java/sootup/java/bytecode/inputlocation/JavaClassPathAnalysisInputLocation.java#L76 ↩
-
https://github.com/ServiceNow/SecurityResearch/tree/main/rhino-tracker/tool/src/main/java/sootup/callgraph/filter ↩
-
https://github.com/ServiceNow/SecurityResearch/tree/main/rhino-tracker/tool/src/main/java/com/snc/secres/tool/passive/Analysis.java#L171 ↩
-
https://github.com/ServiceNow/SecurityResearch/tree/main/rhino-tracker/Readme.md ↩