Tutorial On Precompilation

One of the main foci of development during Julia 1.6 has been to reduce latency, the delay between starting your session and getting useful work done. This is sometimes called “time to first plot,” although it applies to far more than just plotting. While a lot of work (and success) has gone into reducing latency in Julia 1.6, users and developers will naturally want to shrink it even more. This is the inaugural post in a short series devoted to the topic of what package developers can do to reduce latency for their users. This particular installment covers background material–some key underlying concepts and structures–that will hopefully be useful in later installments.

Sources of latency, and reducing it with `precompile`

Most of Julia’s latency is due to code loading and compilation. Julia’s dynamic nature also makes it vulnerable to invalidation and the subsequent need to recompile previously-compiled code; this topic has been covered in a previous blog post, and that material will not be rehashed here. In this series, it is assumed that invalidations are not a dominant source of latency. (You do not need to read the previous blog post to understand this one.)

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

In very rough terms, using SomePkg loads types and/or method definitions, after which calling SomePkg.f(args...) forces SomePkg.f to be compiled (if it hasn’t been already) for the specific types in args.... The primary focus of this series is to explore the opportunity to reduce the cost of compilation. We’ll focus on precompilation,

julia> <span class="hljs-keyword">using</span> SomePkg
[ Info: Precompiling SomePkg [<span class="hljs-number">12345678</span>-abcd-<span class="hljs-number">9876</span>-efab-<span class="hljs-number">1234</span>abcd5e6f]

or the related Precompiling project... output that occurs after updating packages on Julia 1.6. During precompilation, Julia writes module, type, and method definitions in an efficient serialized form. Precompilation in its most basic form happens nearly automatically, but with a bit of manual intervention developers also have an opportunity to save additional information: partial results of compilation, specifically the type inference stage of compilation. Because type inference takes time, this can reduce the latency for the first use of methods in the package.

To motivate this series, let’s start with a simple demonstration in which adding a single line to a package results in a five-fold decrease in latency. We’ll start with a package that we can define in a few lines (thanks to Julia’s metaprogramming capabilities) and depending on very little external code, but which has been designed to have measurable latency. You can copy/paste the following into Julia’s REPL (be aware that it creates a package directory DemoPkg inside your current directory):

julia> <span class="hljs-keyword">using</span> Pkg; Pkg.generate(<span class="hljs-string">"DemoPkg"</span>)
  Generating  project DemoPkg:
    DemoPkg/Project.toml
    DemoPkg/src/DemoPkg.jl
<span class="hljs-built_in">Dict</span>{<span class="hljs-built_in">String</span>, Base.UUID} with <span class="hljs-number">1</span> entry:
  <span class="hljs-string">"DemoPkg"</span> => UUID(<span class="hljs-string">"4d70085e-4304-44c2-b3c3-070197146bfa"</span>)

julia> typedefs = join([<span class="hljs-string">"struct DemoType<span class="hljs-variable">$i</span> <: AbstractDemoType x::Int end; DemoType<span class="hljs-variable">$i</span>(d::AbstractDemoType) = DemoType<span class="hljs-variable">$i</span>(d.x)"</span> <span class="hljs-keyword">for</span> i = <span class="hljs-number">0</span>:<span class="hljs-number">1000</span>], '\n');

julia> codeblock = join([<span class="hljs-string">"    d = DemoType<span class="hljs-variable">$i</span>(d)"</span> <span class="hljs-keyword">for</span> i = <span class="hljs-number">1</span>:<span class="hljs-number">1000</span>], '\n');

julia> open(<span class="hljs-string">"DemoPkg/src/DemoPkg.jl"</span>, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">do</span> io
           write(io, <span class="hljs-string">"""
           module DemoPkg

           abstract type AbstractDemoType end
           <span class="hljs-variable">$typedefs</span>

           function f(x)
               d = DemoType0(x)
               <span class="hljs-variable">$codeblock</span>
               return d
           end

           end
           """</span>)
       <span class="hljs-keyword">end</span>

After executing this, you can open the DemoPkg.jl file to see what f actually looks like. If we load the package, the first call DemoPkg.f(5) takes some time:

julia> push!(<span class="hljs-literal">LOAD_PATH</span>, <span class="hljs-string">"DemoPkg/"</span>);

julia> <span class="hljs-keyword">using</span> DemoPkg

julia> tstart = time(); DemoPkg.f(<span class="hljs-number">5</span>); tend=time(); tend-tstart
<span class="hljs-number">0.28725290298461914</span>

but the second one (in the same session) is much faster:

julia> tstart = time(); DemoPkg.f(<span class="hljs-number">5</span>); tend=time(); tend-tstart
<span class="hljs-number">0.0007619857788085938</span>

The extra cost for the first invocation is the time spent compiling the method. We can save some of this time by precompiling it and saving the result to disk. All we need to do is add a single line to the module definition: either

f(5), which executes f while the package is being precompiled (and remember, execution triggers compilation, the latter being our actual goal)
precompile(f, (Int,)), if we don’t need the output of f(5) but only wish to trigger compilation of f for an Int argument.

Here we’ll choose precompile:

julia> open(<span class="hljs-string">"DemoPkg/src/DemoPkg.jl"</span>, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">do</span> io
           write(io, <span class="hljs-string">"""
           module DemoPkg

           abstract type AbstractDemoType end
           <span class="hljs-variable">$typedefs</span>

           function f(x)
               d = DemoType0(x)
               <span class="hljs-variable">$codeblock</span>
               return d
           end

           precompile(f, (Int,))            # THE CRUCIAL ADDITION!

           end
           """</span>)
       <span class="hljs-keyword">end</span>

Now start a fresh session, load the package (you’ll need that push!(LOAD_PATH, "DemoPkg/") again), and time it:

julia> tstart = time(); DemoPkg.f(<span class="hljs-number">5</span>); tend=time(); tend-tstart
<span class="hljs-number">0.056242942810058594</span>

julia> tstart = time(); DemoPkg.f(<span class="hljs-number">5</span>); tend=time(); tend-tstart
<span class="hljs-number">0.0007371902465820312</span>

It doesn’t eliminate all the latency, but at just one-fifth of the original this is a major improvement in responsivity. The fraction of compilation time saved by precompile depends on the balance between type inference and other aspects of code generation, which in turn depends strongly on the nature of the code: “type-heavy” code, such as this example, often seems to be dominated by inference, whereas “type-light” code (e.g., code that does a lot of numeric computation with just a few types and operations) tends to be dominated by other aspects of code generation.

While currently precompile can only save the time spent on type-inference, in the long run it may be hoped that Julia will also save the results from later stages of compilation. If that happens, precompile will have even greater effect, and the savings will be less dependent on the balance between type-inference and other forms of code generation.

How does this magic work? During package precompilation, Julia creates a *.ji file typically stored in .julia/compiled/v1.x/, where 1.x is your version of Julia. Your *.ji file stores definitions of constants, types, and methods; this happens automatically while your package is being built. Optionally (if you’ve used a precompile directive, or executed methods while the package is being built), it may also include the results of type-inference.

Box 1 It might be natural to wonder, “how does precompile help? Doesn’t it just shift the cost of compilation to the time when I load the package?” The answer is “no,” because a *.ji file is not a recording of all the steps you take when you define the module: instead, it’s a snapshot of the results of those steps. If you define a package

<span class="hljs-keyword">module</span> PackageThatPrints

println(<span class="hljs-string">"This prints only during precompilation"</span>)

<span class="hljs-keyword">function</span> __init__()
    println(<span class="hljs-string">"This prints every time the package is loaded"</span>)
<span class="hljs-keyword">end</span>

<span class="hljs-keyword">end</span>

you’ll see that things that happen transiently do not “make it” into the precompile file: the first println displays only when you build the package, whereas the second one prints on subsequent using PackageThatPrints even when that doesn’t require rebuilding the package.

To “make it” into the precompile file, statements have to be linked to constants, types, methods, and other durable code constructs. The __init__ function is special in that it automatically gets called, if present, at the end of module-loading.

A precompile directive runs during precompilation, but the only thing relevant for the *.ji file are the results (the compiled code) that it produces. Compiled objects (specifically the MethodInstances described below) may be written to the *.ji file, and when you load the package those objects get loaded as well. Loading the results of type inference does take some time, but typically it’s a fair bit quicker than computing inference results from scratch.

Now that we’ve introduced the promise of precompile, it’s time to acknowledge that this topic is complex. How do you know how much of your latency is due to type-inference? Moreover, even when type inference is the dominant source of latency, it turns out you can still find yourself in a circumstance where it is difficult to eliminate most of its cost. In previous Julia versions, this fact has led to more than a little frustration using precompile. One source of trouble was invalidation, which frequently “spoiled” precompilation on earlier Julia versions, but that has been greatly improved (mostly behind-the-scenes, i.e., without package developers needing to do anything) in Julia 1.6. With invalidations largely eliminated, the trickiest remaining aspect of precompilation is one of code ownership: where should the results of precompilation be stored? When a bit of code requires methods from one package or library and types from another, how do you (or how does Julia) decide where to store the compiled code?

In this blog post, we take a big step backwards and start peering under the hood. The goal is to understand why precompile sometimes has dramatic benefits, why sometimes it has nearly none at all, and when it fails how to rescue the situation. To do that, we’ll have to understand some of the “chain of dependencies” that link various bits of Julia code together.

Type-inference, MethodInstances, and backedges

We’ll introduce these concepts via a simple demo (users are encourage to try this and follow along). First, let’s open the Julia REPL and define the following methods:

double(x::<span class="hljs-built_in">Real</span>) = <span class="hljs-number">2</span>x
calldouble(container) = double(container[<span class="hljs-number">1</span>])
calldouble2(container) = calldouble(container)

calldouble2 calls calldouble which calls double on the first element in container. Let’s create a container object and run this code:

julia> c64 = [<span class="hljs-number">1.0</span>]
<span class="hljs-number">1</span>-element <span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>}:
 <span class="hljs-number">1.0</span>

julia> calldouble2(c64)    <span class="hljs-comment"># running it compiles the methods for these types</span>
<span class="hljs-number">2.0</span>

Now, let’s take a brief trip into some internals to understand what Julia’s compiler did when preparing to run that statement. It will be easiest to use the MethodAnalysis package:

julia> <span class="hljs-keyword">using</span> MethodAnalysis

julia> mi = methodinstance(double, (<span class="hljs-built_in">Float64</span>,))
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)

methodinstance is a lot like which, except it asks about type-inferred code. We asked methodinstance to find an instance of double that had been inferred for a single Float64 argument; the fact that it returned a MethodInstance, rather than nothing, indicates that this instance already existed–the method had already been inferred for this argument type because we ran calldouble(c64) which indirectly called double(::Float64). If you currently try methodinstance(double, (Int,)), you should get nothing, because we’ve never called double with an Int argument.

One of the crucial features of type-inference is that it keeps track of dependencies:

julia> <span class="hljs-keyword">using</span> AbstractTrees

julia> print_tree(mi)
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
└─ MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})
   └─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

This indicates that the result for type-inference on calldouble2(::Vector{Float64}) depended on the result for calldouble(::Vector{Float64}), which in turn depended on double(::Float64). That should make sense: there is no way that Julia can know what type calldouble2 returns unless it understands what its callees do. This is our first example of a chain of dependencies that will be a crucial component of understanding how Julia decides where to stash the results of compilation. In encoding this dependency chain, the callee (e.g., double) stores a link to the caller (e.g., calldouble); as a consequence, these links are typically called backedges.

Box 2 Backedges don’t just apply to code you write yourself, and they can link code across modules. For example, to implement 2x, our double(::Float64) calls *(::Int, ::Float64):

julia> mi = methodinstance(*, (<span class="hljs-built_in">Int</span>, <span class="hljs-built_in">Float64</span>))
MethodInstance <span class="hljs-keyword">for</span> *(::<span class="hljs-built_in">Int64</span>, ::<span class="hljs-built_in">Float64</span>)

We can see which Method this instance is from:

julia> mi.def
*(x::<span class="hljs-built_in">Number</span>, y::<span class="hljs-built_in">Number</span>) <span class="hljs-keyword">in</span> Base at promotion.jl:<span class="hljs-number">322</span>

This is defined in Julia’s own Base module. If we’ve run calldouble2(c64), our own double is listed as one of its backedges:

julia> direct_backedges(mi)
<span class="hljs-number">5</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> parse_inf(::Base.TOML.Parser, ::<span class="hljs-built_in">Int64</span>)
 MethodInstance <span class="hljs-keyword">for</span> init(::<span class="hljs-built_in">Int64</span>, ::<span class="hljs-built_in">Float64</span>)
 MethodInstance <span class="hljs-keyword">for</span> show_progress(::<span class="hljs-built_in">IOContext</span>{<span class="hljs-built_in">IOBuffer</span>}, ::Pkg.MiniProgressBars.MiniProgressBar)
 MethodInstance <span class="hljs-keyword">for</span> show_progress(::<span class="hljs-built_in">IO</span>, ::Pkg.MiniProgressBars.MiniProgressBar)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)

direct_backedges, as its name implies, returns a list of the compiled direct callers. (all_backedges returns both direct and indirect callers.) The specific list you get here may depend on what other packages you’ve loaded, and

julia> print_tree(mi)
MethodInstance <span class="hljs-keyword">for</span> *(::<span class="hljs-built_in">Int64</span>, ::<span class="hljs-built_in">Float64</span>)
├─ MethodInstance <span class="hljs-keyword">for</span> parse_inf(::Parser, ::<span class="hljs-built_in">Int64</span>)
│  └─ MethodInstance <span class="hljs-keyword">for</span> parse_number_or_date_start(::Parser)
│     └─ MethodInstance <span class="hljs-keyword">for</span> parse_value(::Parser)
│        ├─ MethodInstance <span class="hljs-keyword">for</span> parse_entry(::Parser, ::<span class="hljs-built_in">Dict</span>{<span class="hljs-built_in">String</span>, <span class="hljs-built_in">Any</span>})
│        │  ├─ MethodInstance <span class="hljs-keyword">for</span> parse_inline_table(::Parser)
│        │  │  ⋮
│        │  │
│        │  └─ MethodInstance <span class="hljs-keyword">for</span> parse_toplevel(::Parser)
│        │     ⋮
│        │
│        └─ MethodInstance <span class="hljs-keyword">for</span> parse_array(::Parser)
│           └─ MethodInstance <span class="hljs-keyword">for</span> parse_value(::Parser)
│              ⋮
│
├─ MethodInstance <span class="hljs-keyword">for</span> init(::<span class="hljs-built_in">Int64</span>, ::<span class="hljs-built_in">Float64</span>)
│  └─ MethodInstance <span class="hljs-keyword">for</span> __init__()
├─ MethodInstance <span class="hljs-keyword">for</span> show_progress(::<span class="hljs-built_in">IOContext</span>{<span class="hljs-built_in">IOBuffer</span>}, ::MiniProgressBar)
│  └─ MethodInstance <span class="hljs-keyword">for</span> (::<span class="hljs-string">var"#59#63"</span>{<span class="hljs-built_in">Int64</span>, <span class="hljs-built_in">Bool</span>, MiniProgressBar, <span class="hljs-built_in">Bool</span>, PackageSpec})(::<span class="hljs-built_in">IOContext</span>{<span class="hljs-built_in">IOBuffer</span>})
├─ MethodInstance <span class="hljs-keyword">for</span> show_progress(::<span class="hljs-built_in">IO</span>, ::MiniProgressBar)
└─ MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
   └─ MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})
      └─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

might be dramatically more complex if you’ve loaded and used large packages that do a lot of computation.

Box 3 Generally, the set of backedges is a graph, not a tree: in real code, it’s possible for f to call itself (e.g., fibonacci(n) = fibonacci(n-1) + fibonacci(n-2)), or for f to call g which calls f. When following backedges, MethodAnalysis omits MethodInstances that appeared previously, thus performing a “search” of the graph. The results of this search pattern can be visualized as a tree.

Type inference behaves similarly: it caches its results, and thus infers each MethodInstance only once. (One wrinkle is constant propagation, which can cause the same MethodInstance to be re-inferred for different constant values.) As a consequence, inference also performs a depth-first search of the call graph.

The creation of backedges is more subtle than it may seem at first glance. To start getting a hint of some of the complexities, first note that currently these are the only inferred instances of these methods:

julia> methodinstances(double)
<span class="hljs-number">1</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)

julia> methodinstances(calldouble)
<span class="hljs-number">1</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

julia> methodinstances(calldouble2)
<span class="hljs-number">1</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

While methodinstance(f, typs) returns a specific MethodInstance, methodinstances(f) returns all inferred instances of f.

Let’s see if we can get Julia to add some additional instances: let’s create a new container, but in a twist this time we’ll use one with abstract element type, so that Julia’s type-inference cannot accurately predict the type of elements in the container. The element type of our container will be AbstractFloat, an abstract type with several subtypes; every actual instance has to have a concrete type, and just to make sure it’s a new type (triggering new compilation) we’ll use Float32:

julia> cabs = <span class="hljs-built_in">AbstractFloat</span>[<span class="hljs-number">1.0f0</span>]   <span class="hljs-comment"># store a `Float32` inside a `Vector{AbstractFloat}`</span>
<span class="hljs-number">1</span>-element <span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>}:
 <span class="hljs-number">1.0f0</span>

julia> calldouble2(cabs)             <span class="hljs-comment"># compile for these new types</span>
<span class="hljs-number">2.0f0</span>

Now let’s look at the available instances:

julia> mis = methodinstances(double)
<span class="hljs-number">3</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">AbstractFloat</span>)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float32</span>)

We see that there are not two but three type-inferred instances of double: one for Float64, one for Float32, and one for AbstractFloat. Let’s check the backedges of each:

julia> print_tree(mis[<span class="hljs-number">1</span>])
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
└─ MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})
   └─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

julia> print_tree(mis[<span class="hljs-number">2</span>])
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">AbstractFloat</span>)

julia> print_tree(mis[<span class="hljs-number">3</span>])
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float32</span>)

Why does the first have backedges to calldouble and then to calldouble2, but the second two do not? Moreover, why does every instance of calldouble have backedges to calldouble2

julia> mis = methodinstances(calldouble)
<span class="hljs-number">2</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})
 MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>})

julia> print_tree(mis[<span class="hljs-number">1</span>])
MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})
└─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

julia> print_tree(mis[<span class="hljs-number">2</span>])
MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>})
└─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>})

in seeming contradiction of the fact that some instances of double lack backedges to calldouble? The results here reflect the success or failure of concrete type-inference. In contrast with Float64 and Float32, AbstractFloat is not a concrete type:

julia> isconcretetype(<span class="hljs-built_in">Float32</span>)
<span class="hljs-literal">true</span>

julia> isconcretetype(<span class="hljs-built_in">AbstractFloat</span>)
<span class="hljs-literal">false</span>

It may surprise some readers that Vector{AbstractFloat} is concrete:

julia> isconcretetype(<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float32</span>})
<span class="hljs-literal">true</span>

julia> isconcretetype(<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>})
<span class="hljs-literal">true</span>

The container is concrete–it has a fully-specified storage scheme and layout in memory–even if the elements are not.

Exercise 1 Is AbstractVector{AbstractFloat} abstract or concrete? How about AbstractVector{Float32}? Check your answers using isconcretetype.

To look more deeply into the implications of concreteness and inference, a useful tool is @code_warntype. You can see difference between c64 and cabs, especially if you run this in the REPL yourself where you can see the red highlighting:

julia> <span class="hljs-meta">@code_warntype</span> calldouble2(c64)
Variables
  <span class="hljs-comment">#self#::Core.Const(calldouble2)</span>
  container::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>}

Body::<span class="hljs-built_in">Float64</span>
<span class="hljs-number">1</span> ─ %<span class="hljs-number">1</span> = Main.calldouble(container)::<span class="hljs-built_in">Float64</span>
└──      <span class="hljs-keyword">return</span> %<span class="hljs-number">1</span>

julia> <span class="hljs-meta">@code_warntype</span> calldouble2(cabs)
Variables
  <span class="hljs-comment">#self#::Core.Const(calldouble2)</span>
  container::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>}

Body::<span class="hljs-built_in">Any</span>
<span class="hljs-number">1</span> ─ %<span class="hljs-number">1</span> = Main.calldouble(container)::<span class="hljs-built_in">Any</span>
└──      <span class="hljs-keyword">return</span> %<span class="hljs-number">1</span>

Note that only the return type (::Float64 vs ::Any) differs between these; this is what accounts for the fact that calldouble has backedges to calldouble2 in both cases, because in both cases the specific caller/callee chain can be successfully inferred. The really big differences emerge one level lower:

julia> <span class="hljs-meta">@code_warntype</span> calldouble(c64)
Variables
  <span class="hljs-comment">#self#::Core.Const(calldouble)</span>
  container::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>}

Body::<span class="hljs-built_in">Float64</span>
<span class="hljs-number">1</span> ─ %<span class="hljs-number">1</span> = Base.getindex(container, <span class="hljs-number">1</span>)::<span class="hljs-built_in">Float64</span>
│   %<span class="hljs-number">2</span> = Main.double(%<span class="hljs-number">1</span>)::<span class="hljs-built_in">Float64</span>
└──      <span class="hljs-keyword">return</span> %<span class="hljs-number">2</span>

julia> <span class="hljs-meta">@code_warntype</span> calldouble(cabs)
Variables
  <span class="hljs-comment">#self#::Core.Const(calldouble)</span>
  container::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>}

Body::<span class="hljs-built_in">Any</span>
<span class="hljs-number">1</span> ─ %<span class="hljs-number">1</span> = Base.getindex(container, <span class="hljs-number">1</span>)::<span class="hljs-built_in">AbstractFloat</span>
│   %<span class="hljs-number">2</span> = Main.double(%<span class="hljs-number">1</span>)::<span class="hljs-built_in">Any</span>
└──      <span class="hljs-keyword">return</span> %<span class="hljs-number">2</span>

In the first case, getindex was guaranteed to return a Float64, but in the second case it’s only known to be an AbstractFloat. Moreover, type-inference cannot predict a concrete type for the return of double(::AbstractFloat), though it can for double(::Float64). Consequently the call with ::AbstractFloat is made via runtime dispatch, where execution pauses, Julia asks for the concrete type of the object, and then it makes the appropriate call to double (in the case of cabs[1], to double(::Float32)).

For completeness, what happens if we add another container with concrete eltype?

julia> c32 = [<span class="hljs-number">1.0f0</span>]
<span class="hljs-number">1</span>-element <span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float32</span>}:
 <span class="hljs-number">1.0</span>

julia> calldouble2(c32)
<span class="hljs-number">2.0f0</span>

julia> mis = methodinstances(double)
<span class="hljs-number">3</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">AbstractFloat</span>)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float32</span>)

julia> print_tree(mis[<span class="hljs-number">1</span>])
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
└─ MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})
   └─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>})

julia> print_tree(mis[<span class="hljs-number">2</span>])
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">AbstractFloat</span>)

julia> print_tree(mis[<span class="hljs-number">3</span>])
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float32</span>)
└─ MethodInstance <span class="hljs-keyword">for</span> calldouble(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float32</span>})
   └─ MethodInstance <span class="hljs-keyword">for</span> calldouble2(::<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float32</span>})

So now both concretely-inferred versions of double link all the way back to calldouble2, but only when the element type of the container is also concrete. A single MethodInstance may be called by multiple MethodInstances, but most commonly a backedge is created only when the call can be inferred.

Exercise 2 Does Julia ever compile methods, and introduce backedges, for abstract types? Start a fresh session, and instead of using the definitions above define double using @nospecialize :

double(<span class="hljs-meta">@nospecialize</span>(x::<span class="hljs-built_in">Real</span>)) = <span class="hljs-number">2</span>x

Now compare what kind of backedges you get with c64 and cabs. It may be most informative to quit your session and start fresh between trying these two different container types. You’ll see that Julia is quite the opportunist when it comes to specialization!

Precompilation and backedges

Let’s turn the example above into a package:

julia> <span class="hljs-keyword">using</span> Pkg; Pkg.generate(<span class="hljs-string">"BackedgeDemo"</span>)
  Generating  project BackedgeDemo:
    BackedgeDemo/Project.toml
    BackedgeDemo/src/BackedgeDemo.jl
<span class="hljs-built_in">Dict</span>{<span class="hljs-built_in">String</span>, Base.UUID} with <span class="hljs-number">1</span> entry:
  <span class="hljs-string">"BackedgeDemo"</span> => UUID(<span class="hljs-string">"35dad884-25a6-48ad-b13b-11b63ee56c40"</span>)

julia> open(<span class="hljs-string">"BackedgeDemo/src/BackedgeDemo.jl"</span>, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">do</span> io
           write(io, <span class="hljs-string">"""
           module BackedgeDemo

           double(x::Real) = 2x
           calldouble(container) = double(container[1])
           calldouble2(container) = calldouble(container)

           precompile(calldouble2, (Vector{Float32},))
           precompile(calldouble2, (Vector{Float64},))
           precompile(calldouble2, (Vector{AbstractFloat},))

           end
           """</span>)
       <span class="hljs-keyword">end</span>
<span class="hljs-number">282</span>

You can see we created a package and defined those three methods. Crucially, we’ve also added three precompile directives, all for the top-level calldouble2. We did not add any explicit precompile directives for its callees calldouble, double, or anything needed by double (like * to implement 2*x).

Now let’s load this package and see if we have any MethodInstances:

julia> push!(<span class="hljs-literal">LOAD_PATH</span>, <span class="hljs-string">"BackedgeDemo/"</span>)
<span class="hljs-number">4</span>-element <span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">String</span>}:
 <span class="hljs-string">"@"</span>
 <span class="hljs-string">"@v#.#"</span>
 <span class="hljs-string">"@stdlib"</span>
 <span class="hljs-string">"BackedgeDemo/"</span>

julia> <span class="hljs-keyword">using</span> BackedgeDemo
[ Info: Precompiling BackedgeDemo [<span class="hljs-number">44</span>c70eed-<span class="hljs-number">03</span>a3-<span class="hljs-number">46</span>c0-<span class="hljs-number">8383</span>-afc033fb6a27]

julia> <span class="hljs-keyword">using</span> MethodAnalysis

julia> methodinstances(BackedgeDemo.double)
<span class="hljs-number">3</span>-element <span class="hljs-built_in">Vector</span>{Core.MethodInstance}:
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float32</span>)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">Float64</span>)
 MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">AbstractFloat</span>)

Hooray! Even though we’ve not used this code in this session, the type-inferred MethodInstances are already there! (This is true only because of those precompile directives.) You can also verify that the same backedges get created as when we ran this code interactively above. We have successfully saved the results of type inference.

These MethodInstances got cached in BackedgeDemo.ji. It’s worth noting that even though the precompile directive got issued from this package, MethodInstances for methods defined in other packages or libraries can be saved as well. For example, Julia does not come pre-built with the inferred code for Int * Float32: in a fresh session,

julia> <span class="hljs-keyword">using</span> MethodAnalysis

julia> mi = methodinstance(*, (<span class="hljs-built_in">Int</span>, <span class="hljs-built_in">Float32</span>))

returns nothing (the MethodInstance doesn’t exist), whereas if we’ve loaded BackedgeDemo then

julia> mi = methodinstance(*, (<span class="hljs-built_in">Int</span>, <span class="hljs-built_in">Float32</span>))
MethodInstance <span class="hljs-keyword">for</span> *(::<span class="hljs-built_in">Int64</span>, ::<span class="hljs-built_in">Float32</span>)

julia> mi.def        <span class="hljs-comment"># what Method is this MethodInstance from?</span>
*(x::<span class="hljs-built_in">Number</span>, y::<span class="hljs-built_in">Number</span>) <span class="hljs-keyword">in</span> Base at promotion.jl:<span class="hljs-number">322</span>

So even though the method is defined in Base, because BackedgeDemo needed this type-inferred code it got stashed in BackedgeDemo.ji.

This is fantastic, because it means the complete results of type-inference can be saved, even when they cross boundaries between packages and libraries. Nevertheless, there are significant limitations to this ability to stash MethodInstances from other modules. Most crucially, *.ji files can only hold code they “own,” either:

for a method defined in the package
through a chain of backedges to a method defined by the package

Exercise 3 To see this limitation in action, delete the precompile(calldouble2, (Vector{Float32},)) directive from BackedgeDemo.jl, so that it has only

precompile(calldouble2, (<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">Float64</span>},))
precompile(calldouble2, (<span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>},))

but then add

precompile(*, (<span class="hljs-built_in">Int</span>, <span class="hljs-built_in">Float32</span>))

in an attempt to force inference of that method anyway.

Start a fresh session and load the package (it should precompile again), and check whether methodinstance(*, (Int, Float32)) returns a MethodInstance or nothing. Also run print_tree on the results of each item in methodinstances(BackedgeDemo.double).

Where there is no “chain of ownership” to BackedgeDemo, Julia doesn’t know where to stash the MethodInstances that get created by precompile; those MethodInstances get created, but they do not get incorporated into the *.ji file because there is no particular module-owned MethodInstances that they link back to. Consequently, we can’t precompile methods defined in other modules in and of themselves; we can only do it if those methods are linked by backedges to this package.

In practice, this means that even when packages add precompile directives, if there are a lot of type-inference failures the results can be very incomplete and the consequential savings may be small.

Quiz Add a new type to BackedgeDemo:

<span class="hljs-keyword">export</span> SCDType
<span class="hljs-keyword">struct</span> SCDType <span class="hljs-keyword">end</span>

and a precompile directive for Base.push!:

precompile(push!, (<span class="hljs-built_in">Vector</span>{SCDType}, SCDType))

Now load the package and check whether the corresponding MethodInstance exists. If not, can you think of a way to get that MethodInstance added to the *.ji file?

Answer is at the bottom of this post.

Box 4 precompile can also be passed a complete Tuple-type: precompile(calldouble2, (Vector{AbstractFloat},)) can alternatively be written

precompile(<span class="hljs-built_in">Tuple</span>{typeof(calldouble2), <span class="hljs-built_in">Vector</span>{<span class="hljs-built_in">AbstractFloat</span>}})

This form appears frequently if precompile directives are issued by code that inspects MethodInstances, because this signature is in the specType field of a MethodInstance:

julia> mi = methodinstance(BackedgeDemo.double, (<span class="hljs-built_in">AbstractFloat</span>,))
MethodInstance <span class="hljs-keyword">for</span> double(::<span class="hljs-built_in">AbstractFloat</span>)

julia> mi.specTypes
<span class="hljs-built_in">Tuple</span>{typeof(BackedgeDemo.double), <span class="hljs-built_in">AbstractFloat</span>}

Box 5 One other topic we’ve not yet discussed is that when precompile fails, it does so “almost” silently:

julia> methods(double)
<span class="hljs-comment"># 1 method for generic function "double":</span>
[<span class="hljs-number">1</span>] double(x::<span class="hljs-built_in">Real</span>) <span class="hljs-keyword">in</span> BackedgeDemo at /tmp/BackedgeDemo/src/BackedgeDemo.jl:<span class="hljs-number">3</span>

julia> precompile(double, (<span class="hljs-built_in">String</span>,))
<span class="hljs-literal">false</span>

Even though double can’t be compiled for String, the corresponding precompile doesn’t error, it only returns false. If you want to monitor the utility of your precompile directives, sometimes it’s useful to preface them with @assert; all’s well if precompilation succeeds, but if changes to the package mean that the precompile directive has “gone bad,” then you get an error. Hopefully, such errors would be caught before shipping the package to users!

Summary

In this tutorial, we’ve learned about MethodInstances, backedges, inference, and precompilation. Some important take-home messages are:

you can store the results of type-inference with explicit precompile directives
to be useful, precompile has to be able to establish a chain of ownership to some package
chains-of-ownership are bigger and more complete when type-inference succeeds

An important conclusion is that precompilation works better when type inference succeeds. For some packages, time invested in improving inferrability can make your precompile directives work better.

Looking ahead

Future installments will focus on describing some powerful new tools:

tools to measure how inference is spending its time
tools to help make decisions about (de)specialization
tools to detect and fix inference failures
tools to generate effective precompile directives

Stay tuned!

Answer to quiz Directly precompiling push!(::Vector{SCDType}, ::SCDType) fails, because while your package “owns” SCDType, it does not own the method of push!.

However, if you add a method that calls push! and then precompile it,

dopush() = push!(SCDType[], SCDType())
precompile(dopush, ())

then the MethodInstance for push!(::Vector{SCDType}, ::SCDType) will be added to the package through the backedge to dopush (which you do own).

This was an artifical example, but in more typical cases this happens organically through the functionality of your package. But again, this works only for inferrable calls.

By Tim Holy
Source Julia Programming Language