How it works
Import Integrity has to analyze how every file in a codebase imports and exports relative to every other file, while running through a per-file synchronous plugin API. This page describes the four-phase pipelined algorithm it uses to do this efficiently, and how caching at each phase keeps re-runs fast.
The four phases are:
- Parsing each file's imports and exports into an analysis-friendly form
- Resolving module specifiers to actual files
- Traversing the resulting graph to link imports to their ultimate exports
- Connecting cross-package data for monorepo analysis
Each phase has different caching characteristics, which is why they're isolated. See Performance and Accuracy for benchmark numbers showing how this design plays out in practice.
Import Integrity includes an editor mode that tightly integrates with the caching layers at each phase of the algorithm, and also polls the file system for changes. This mode keeps import/export information up to date, even if the LSP server never sees those changes, ensuring you never get stale lint errors and keeping your editor snappy.
Phase 1: AST analysis
This phase reads in every non-ignored file inside packageRootDir with a known JavaScript extension (.js, .mjs, .cjs, .jsx, .ts, .mts, .cts, .tsx) and parses the file into an AST. In monorepos, this happens independently for each discovered package. The AST is then converted into an import/export specific form optimized for import/export analysis.
For example, the import statement import { foo } from './bar' gets boiled down to:
{
importAlias: 'foo',
importName: 'foo',
importType: 'single',
isTypeImport: false,
moduleSpecifier: './bar',
statementNodeRange: [0, 27],
reportNodeRange: [9, 12]
}This phase is by far the most performance intensive of the four phases due to file reads and AST parsing, comprising about 80% of total execution time on a cold cache. At the same time, information computed for each file is completely independent of information in any other file. This independence is exploited at the caching layer: changes to any one file do not invalidate the cache for any other file.
For example, this phase takes 2.1 seconds on a cold cache running on the VS Code codebase out of 2.6 seconds total, when run on the same system used in the comparisons benchmarks. Subsequent file edits in the editor only take ~1ms due to the high cacheability of this phase.
Details for the information computed in this stage can be viewed in the types file for base information.
Phase 2: Module specifier resolution
This phase goes through every import/reexport entry from the first phase and resolves the module specifier. This phase is the second most performance intensive phase, taking around 15% of total execution time. On VS Code, this phase takes 0.4 seconds, out of 2.6 seconds total.
Import Integrity uses its own high-performance resolver to achieve this speed. It resolves module specifiers to one of three types in a very specific order:
- A Node.js built-in module, as reported by
builtinModules()in thenode:modulemodule - A file within the current
packageRootDir, aka first party - A third party module
Module specifiers are resolved in this order because we already have a list of built-in modules and first party files in memory, meaning we never have to touch the filesystem. By skipping file I/O, resolving is significantly faster than it would be otherwise, and is even on par with OXC's Rust-based resolver despite being written in JavaScript. We can get away with this because we move third party module resolution to the end, and can "default" to imports being third party imports if not found in memory (the only case where file-system lookups are usually needed).
In this phase, changes to one file may impact the information in another file. Nonetheless, determining which files are impacted is relatively straightforward. In addition, changes typically do not impact a large number of other file's caches. This means we can still use caching in this phase to measurably improve performance.
Details for the information computed in this stage can be viewed in the types file for resolved information.
Phase 3: Import graph analysis
This third phase traverses the import/export graph created in the second phase to determine the ultimate source of all imports/reexports. In addition, we store other useful pieces of information, such as collecting a list of every file that imports a specific export, and linking each import statement to a specific export statement.
This phase is the second least performance intensive, representing only about 4% of total run time. On the VS Code Codebase, this phase takes 100ms, out of 2.6 seconds total.
Linking imports to exports can be non-trivial, especially if there are a lot of reexports. For example:
// a.ts
import b from './b'; // points to file d.ts!
// b.ts
export { default } from './c';
// c.ts
export { default } from './d';
// d.ts
export default 10; // Export for import in file a.ts!As we've seen, this phase is not performance intensive due to all the heavy lifting we've done in the first two phases. It is also the most entangled and difficult to cache, as we saw in the example above. As a result, Import Integrity does not do any caching during this phase, since it has little effect on overall performance anyways.
Details for the information computed in this stage can be viewed in the types file for analyzed information.
Phase 4: Monorepo analysis
This fourth phase collects the import graph analysis from each package in the monorepo to analyze cross-package imports and exports. This phase produces data similar to the third phase, except it utilizes information from the third phase to short-circuit many of its computations.
This phase is the least performance intensive, representing less than 1% of total run time. On the VS Code Codebase, this phase takes 13ms, out of 2.6 seconds total. Similar to the third phase, this phase is not easily cached, but any caching would have negligible effect on total performance.
Details for the information computed in this stage can also be viewed in the types file for analyzed information. Data populated by this phase have comments indicating that they are phase 4 data.