LibClang Tutorial
Clang is an open-source compiler built on the LLVM framework and targeting C, C++, and Objective-C (LLVM is also the JIT backend for Julia). Due to a highly modular design, Clang has in recent years become the core of a growing number of projects utilizing pieces of the compiler, such as tools for source-to-source translation, static analysis and security evaluation, and editor tools for code completion, formatting, etc.
While LLVM and Clang are written in C++, the Clang project maintains a C-exported interface called "libclang" which provides access to the abstract syntax tree and type representations. Thanks to the ubiquity of support for C calling conventions, a number of languages have utilized libclang as a basis for tooling related to C and C++.
The Clang.jl Julia package wraps libclang, provides a small convenience API for Julia-style programming, and provides a C-to-Julia wrapper generator built on libclang functionality.
Here is the header file example.h
used in the following examples:
// example.h
struct ExStruct {
int kind;
char* name;
float* data;
};
void* ExFunction (int kind, char* name, float* data) {
struct ExStruct st;
st.kind = kind;
st.name = name;
st.data = data;
}
Printing Struct Fields
To motivate the discussion with a succinct example, consider this struct:
struct ExStruct {
int kind;
char* name;
float* data;
};
Parsing and querying the fields of this struct requires just a few lines of code:
julia> using Clang
julia> trans_unit = Clang.parse_header(Index(), "example.h")
TranslationUnit(Ptr{Nothing} @0x00007fe13cdc8a00, Index(Ptr{Nothing} @0x00007fe13cc8dde0, 0, 1))
julia> root_cursor = Clang.getTranslationUnitCursor(trans_unit)
CLCursor (CLTranslationUnit) example.h
julia> struct_cursor = search(root_cursor, "ExStruct") |> only
CLCursor (CLStructDecl) ExStruct
julia> for c in children(struct_cursor) # print children
println("Cursor: ", c, "\n Kind: ", kind(c), "\n Name: ", name(c), "\n Type: ", Clang.getCursorType(c))
end
Cursor: CLCursor (CLFieldDecl) kind
Kind: CXCursor_FieldDecl(6)
Name: kind
Type: CLType (CLInt)
Cursor: CLCursor (CLFieldDecl) name
Kind: CXCursor_FieldDecl(6)
Name: name
Type: CLType (CLPointer)
Cursor: CLCursor (CLFieldDecl) data
Kind: CXCursor_FieldDecl(6)
Name: data
Type: CLType (CLPointer)
AST Representation
Let's examine the example above, starting with the variable trans_unit
:
julia> trans_unit
TranslationUnit(Ptr{Nothing} @0x00007fa9ac6a9f90, Index(Ptr{Nothing} @0x00007fa9ac6b4080, 0, 1))
A TranslationUnit
is the entry point to the libclang AST. In the example above, trans_unit
is a TranslationUnit
for the parsed file example.h
. The libclang AST is represented as a directed acyclic graph of cursor nodes carrying three pieces of essential information:
- Kind: purpose of cursor node
- Type: type of the object represented by cursor
- Children: list of child nodes
julia> root_cursor
CLCursor (CLTranslationUnit) example.h
root_cursor
is the root cursor node of the TranslationUnit
.
In Clang.jl the cursor type is encapsulated by a Julia type deriving from the abstract type CLCursor. Under the hood, libclang represents each cursor (CXCursor) kind and type (CXType) as an enum value. These enum values are used to automatically map all CXCursor and CXType objects to Julia types. Thus, it is possible to write multiple-dispatch methods against CLCursor or CLType variables.
julia> dump(root_cursor)
CLTranslationUnit
cursor: Clang.LibClang.CXCursor
kind: Clang.LibClang.CXCursorKind CXCursor_TranslationUnit(300)
xdata: Int32 0
data: Tuple{Ptr{Nothing},Ptr{Nothing},Ptr{Nothing}}
1: Ptr{Nothing} @0x00007fe13b3552e8
2: Ptr{Nothing} @0x0000000000000001
3: Ptr{Nothing} @0x00007fe13cdc8a00
Under the hood, libclang represents each cursor kind and type as an enum value. These enums are translated into Julia as a subtype of Cenum
:
julia> dump(Clang.LibClang.CXCursorKind)
Clang.LibClang.CXCursorKind <: Clang.LibClang.CEnum.Cenum{UInt32}
The example demonstrates two different ways of accessing child nodes of a given cursor. Here, the children function returns an iterator over the child nodes of the given cursor:
julia> children(struct_cursor)
3-element Array{CLCursor,1}:
CLCursor (CLFieldDecl) kind
CLCursor (CLFieldDecl) name
CLCursor (CLFieldDecl) data
And here, the search function returns a list of child node(s) matching the given name:
julia> search(root_cursor, "ExStruct")
1-element Array{CLCursor,1}:
CLCursor (CLStructDecl) ExStruct
Type representation
The above example also demonstrates querying of the type associated with a given cursor using the helper function type
. In the output:
Cursor: CLCursor (CLFieldDecl) kind
Kind: CXCursor_FieldDecl(6)
Name: kind
Type: CLType (CLInt)
Cursor: CLCursor (CLFieldDecl) name
Kind: CXCursor_FieldDecl(6)
Name: name
Type: CLType (CLPointer)
Cursor: CLCursor (CLFieldDecl) data
Kind: CXCursor_FieldDecl(6)
Name: data
Type: CLType (CLPointer)
Each CLFieldDecl
cursor has an associated CLType
object, with an identity reflecting the field type for the given struct member. It is critical to note the difference between the representation for the kind field and the name and data fields. kind is represented directly as an CLInt
object, but name and data are represented as CLPointer
CLTypes. As explored in the next section, the full type of the CLPointer
can be queried to retrieve the full char *
and float *
types of these members. User-defined types are captured using a similar scheme.
Function Arguments and Types
To further explore type representations, consider the following function (included in example.h):
void* ExFunction (int kind, char* name, float* data) {
struct ExStruct st;
st.kind = kind;
st.name = name;
st.data = data;
}
To find the cursor for this function declaration, we use function search
to retrieve nodes of kind CXCursor_FunctionDecl
, and select the final one in the list:
julia> using Clang.LibClang # CXCursor_FunctionDecl is exposed from LibClang
julia> fdecl = search(root_cursor, CXCursor_FunctionDecl) |> only
CLCursor (CLFunctionDecl) ExFunction(int, char *, float *)
julia> fdecl_children = [c for c in children(fdecl)]
4-element Array{CLCursor,1}:
CLCursor (CLParmDecl) kind
CLCursor (CLParmDecl) name
CLCursor (CLParmDecl) data
CLCursor (CLCompoundStmt)
The first three children are CLParmDecl
cursors with the same name as the arguments in the function signature. Checking the types of the CLParmDecl
cursors indicates a similarity to the function signature:
julia> [Clang.getCursorType(t) for t in fdecl_children[1:3]]
3-element Array{CLType,1}:
CLType (CLInt)
CLType (CLPointer)
CLType (CLPointer)
And, finally, retrieving the target type of each CLPointer
argument confirms that these cursors represent the function argument type declaration:
julia> [Clang.getPointeeType(Clang.getCursorType(t)) for t in fdecl_children[2:3]]
2-element Array{CLType,1}:
CLType (CLChar_S)
CLType (CLFloat)
Printing Indented Cursor Hierarchy
As a closing example, here is a simple, indented AST printer using CLType
- and CLCursor
-related functions, and utilizing various aspects of Julia's type system.
printind(ind::Int, st...) = println(join([repeat(" ", 2*ind), st...]))
printobj(cursor::CLCursor) = printobj(0, cursor)
printobj(t::CLType) = join(typeof(t), " ", spelling(t))
printobj(t::CLInt) = t
printobj(t::CLPointer) = Clang.getPointeeType(t)
printobj(ind::Int, t::CLType) = printind(ind, printobj(t))
function printobj(ind::Int, cursor::Union{CLFieldDecl, CLParmDecl})
printind(ind+1, typeof(cursor), " ", printobj(Clang.getCursorType(cursor)), " ", name(cursor))
end
function printobj(ind::Int, node::Union{CLCursor, CLStructDecl, CLCompoundStmt,
CLFunctionDecl, CLBinaryOperator})
printind(ind, " ", typeof(node), " ", name(node))
for c in children(node)
printobj(ind + 1, c)
end
end
julia> printobj(root_cursor)
CLTranslationUnit example.h
CLStructDecl ExStruct
CLFieldDecl CLType (CLInt) kind
CLFieldDecl CLType (CLChar_S) name
CLFieldDecl CLType (CLFloat) data
CLFunctionDecl ExFunction(int, char *, float *)
CLParmDecl CLType (CLInt) kind
CLParmDecl CLType (CLChar_S) name
CLParmDecl CLType (CLFloat) data
CLCompoundStmt
CLDeclStmt
CLVarDecl st
CLTypeRef struct ExStruct
CLBinaryOperator
CLMemberRefExpr kind
CLDeclRefExpr st
CLUnexposedExpr kind
CLDeclRefExpr kind
CLBinaryOperator
CLMemberRefExpr name
CLDeclRefExpr st
CLUnexposedExpr name
CLDeclRefExpr name
CLBinaryOperator
CLMemberRefExpr data
CLDeclRefExpr st
CLUnexposedExpr data
CLDeclRefExpr data
Note that a generic printobj
function has been defined for the abstract CLType
and CLCursor
types, and multiple dispatch is used to define the printers for various specific types needing custom behavior. In particular, the following function handles all cursor types for which recursive printing of child nodes is required:
function printobj(ind::Int, node::Union{CLCursor, CLStructDecl, CLCompoundStmt, CLFunctionDecl})
Now, printobj
has been moved into Clang.jl with a new name: dumpobj
.
Parsing Summary
As discussed above, there are several key aspects of the Clang.jl/libclang API:
- tree of Cursor nodes representing the AST, notes have unique children.
- each Cursor node has a Julia type identifying the syntactic construct represented by the node.
- each node also has an associated CLType referencing either intrinsic or user-defined datatypes.
There are a number of details omitted from this post, especially concerning the full variety of CLCursor
and CLType
representations available via libclang. For further information, please see the libclang documentation.
Acknowledgement
Eli Bendersky's post Parsing C++ in Python with Clang has been an extremely helpful reference.