Knol will be unavailable during scheduled maintenance starting at Mon, 09 Nov 2009 18:30:00 GMT. We expect the maintenance to be completed at Mon, 09 Nov 2009 20:00:00 GMT.
Version: Baidi441

Program Call Graphs


Introduction


A call graph is defined as a set of directed edges connecting call sites (statements invoking method calls) to corresponding target methods [1]. It is a very powerful tool for program analysis and can be used to: help plan testing strategies, reduce program size (by eliminating sub-routines that are not invoked) and help programmers understand and debug large programs. Often the method invoked due to a specific call is determined at run time based on the context in which the call is made, hence in a call graph a single call site could have multiple target methods. This is especially evident in object oriented languages where inheritance and polymorphism make method calls highly dependent on the execution context. To get the set of target methods associated with a call site we can either observe one or more executions of the program and note all methods invoked from a call site (dynamic call graph generation) or statically determine the possible methods (static call graph construction). Dynamic call graphs tend to under-estimate the number of target methods for a call site where as static call graphs tend to over-estimate this this number. A theoretically ideal call graph is the union of the dynamic call graphs over all possible executions of the program. Dynamic call graphs are not safe and generating static call graphs is computationally expensive.



Call Graph Generation Techniques



Reachability Analysis (RA):

The most basic call graph construction scheme is called Reachability Analysis (RA) and has many variations
in literature such as [2]. RA only takes into account the name (or sometimes the signature) of a method, hence a method named ‘m’ is marked as reachable if a reachable method contains a virtual call ‘e.m () ’ where ‘e’ can be any object. This procedure is recursively applied on all methods starting with the program entry point to generate the complete call graph. The RA call graph is very conservative as it ignores class hierarchy and hence links a call site to all methods throughout the program which share a name. However it is computationally inexpensive as the only state required is a set of all methods in the program. This list can be pre-computed and stored in a hash table, mapping names to actual methods. At each call site RA needs to lookup the map and add all methods matching the call to the set of target methods for the call.


Class Hierarchy Analysis (CHA):

Note that RA generates an imprecise set of target methods because it lacks information about the class hierarchy of the object on which the method is called. CHA [3] addresses this problem, for every call ‘e.m()’ we filter the set of target methods to only those defined in static type1 of object ‘e’ or one of its subclasses. For example, Figure 1 shows a sample hierarchy. If method 2 is called on an object of static type class B then B.2() and D.2() are both target methods. If neither the class or any of its subclasses contain a method ‘m’, then the target method is the one defined in the nearest ancestor of ‘e’ that defines ‘m’. For example, if method 1 is called on an object of static type class C then only A.1() is a target method. CHA is more precise then RA but also requires more state; a representation of the class hierarchy and all methods belonging to each class. At each call site the algorithm needs to lookup the class hierarchy and use inheritance rules to determine the target
methods.


Rapid Type Analysis (RTA):

CHA is considerably more precise then RA but it still overestimates the number of target methods as it includes methods from all subclasses. Not all of the subclasses have instantiated objects in the context of the call site, The uninstantiated subclasses cannot contain target methods. RTA [4] determines the set of classes instantiated in the context of the call site and uses this information to filter the number of possible target methods further. For example in Figure 1 if method 2 is called on class C but and there is no live object of class F in the method where the call originates then F.2() is not included in the call graph.RTA is more expensive in terms of computation and memory overhead as it needs to maintain a list of all instantiated object types for each call site in addition to the state needed by CHA . However the added complexity can lead to a substantial increase in precession especially in programs with extensive use of polymorphism.

[1] Ondˇrej Lhot´ak. Comparing call graphs. In PASTE ’07: Proceedings of the 7th ACM SIGPLAN-SIGSOFT
Workshop on Program Analysis for Software Tools and Engineering, pages 37–42, New York, NY, USA, 2007. ACM Press.

[2] Amitabh Srivastava. Unreachable procedures in object-oriented programming. ACM Lett. Program. Lang. Syst., 1(4):355–364, 1992.

[3] Jeffrey Dean, David Grove, and Craig Chambers. Optimization of object-oriented programs using static class hierarchy analysis. Lecture Notes in Computer Science, 952:77–101, 1995.

[4] David F. Bacon and Peter F. Sweeney. Fast static analysis of C++ virtual function calls. pages 324–341.

Comments

Article rating:
Your rating:

Categories

Based on community consensus.

Activity for this knol

This week:

15pageviews

Totals:

1071pageviews