Created
July 12, 2023 19:27
-
-
Save mdempsky/63c93ba1b4a1eb0b3fc301d036386a2d to your computer and use it in GitHub Desktop.
Inlining thoughts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Go notes | |
To fit into Go's current package-oriented compilation mode, we can | |
define a set of top-down root entry-points into each package. | |
Any global variable initializations and init functions are always part | |
of the root. | |
For package main, the main function is also a root. | |
For non-main packages, the package's exported objects are its roots. | |
Similar to the old "crawler" algorithm used by iexport. | |
Beyond escape analysis, it's probably worth knowing when pointers flow | |
to an unsafe.Pointer conversion, because that indicates the pointed-to | |
object can be modified arbitrarily. (Relatedly: go.dev/issue/58625.) | |
I remember also looking into making string->[]byte or []byte->string | |
conversions more efficient. go.dev/issue/2205, go.dev/issue/43752. | |
* Inlining and flow analysis | |
Callahan et al, 1986. | |
"Interprocedural constant propagation" | |
https://dl.acm.org/doi/10.1145/12276.13327 | |
Introduced the idea to use "jump" functions to summarize data flow | |
within functions, so that they can be wired up into interprocedural | |
analyses. In Go's separate compilation model, we'd probably write this | |
out via export data. | |
Jagannathan and Wright, 1996. | |
"Flow-directed Inlining" | |
https://dl.acm.org/doi/10.1145/231379.231417 | |
Waddell and Dybvig, 1997. | |
"Fast and Effective Procedure Inlining" | |
https://link.springer.com/chapter/10.1007/BFb0032732 | |
Ashley, 1997. | |
"The effectiveness of flow analysis for inlining" | |
https://dl.acm.org/doi/10.1145/258949.258959 | |
A bunch of papers on the idea of using global control flow analysis to | |
improve inlining. I haven't read and understood them well enough just | |
yet to really understand their comparisons. | |
Gilray et al, 2016. | |
"Pushdown control-flow analysis for free" | |
https://dl.acm.org/doi/10.1145/2837614.2837631 | |
Recent control flow analysis algorithm, which claims significant | |
improvements in precision and performance. Seems interesting to read | |
into further. | |
Though the "efficient" O(N^3) figure mentioned in the abstract is a | |
bit concerning. | |
** TODO | |
Theodoridis et al, 2022. | |
"Understanding and exploiting optimal function inlining" | |
https://dl.acm.org/doi/10.1145/3503222.3507744 | |
Just found this today. Seems infeasible for a production compiler | |
mode, but it does have an "autotuner" that seems like it would be | |
interesting to apply across the Go ecosystem to learn better inlining | |
heuristics. | |
It mentions a bunch of inlining heuristics: [7, 20, 23, 24, 28] | |
I also noticed some citations of PGO-driven inlining, e.g.: | |
Ayers et al, 1997. | |
"Aggressive inlining" | |
https://dl.acm.org/doi/10.1145/258916.258928 | |
Prokopec et al, 2019. | |
"An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers" | |
https://ieeexplore.ieee.org/document/8661171 | |
* Fixpoint algorithms | |
Kim et al, 2019. | |
"Deterministic parallel fixpoint computation" | |
https://dl.acm.org/doi/10.1145/3371082 | |
Introduces the idea of a "weak partial order" (WPO), which organizes a | |
dependency graph into nested directed acyclic graphs (DAGs) of | |
strongly-connected components (SCCs). Provides an almost-linear time | |
algorithm to construct a WPO from a dependency graph, and a | |
deterministic parallel algorithm to compute a fixpoint. | |
Inserts "widening" points to break all dependency cycles. | |
Requires the dependency graph to be known in advance. | |
I've implemented this in Go and verified that it produces the same | |
result as the current liveness analysis pass's naive | |
iterate-until-fixpoint algorithm. | |
Bourdoncle, 2005. | |
"Efficient chaotic iteration strategies with widenings" | |
https://link.springer.com/chapter/10.1007/BFb0039704 | |
Widely cited paper about fixpoint calculation with "weak topological | |
orders" (WTO); these are generalized into WPOs in the above paper, | |
which allow parallel evaluation. | |
I haven't actually found a copy to read yet. According to the | |
introduction section in the preview, they have a way to extend the | |
algorithm to interprocedural analysis. I'm interested to read that. | |
Van Es et al, 2020. | |
"A Parallel Worklist Algorithm for Modular Analyses" | |
https://ieeexplore.ieee.org/document/9252039 | |
Straight forward worklist algorithm that allows very high | |
parallelism. Each individual task is farmed out to a separate worker, | |
which has access to a snapshot of global state. It carries out its | |
changes and reports back to the coordinator, who merges the results | |
into the global state, and restarts any workers that are affected. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment