Overview
Threadle: Population-scale network data management
Threadle is a high-performance software system for large, complex, multilayer and multimode network datasets, designed explicitly for working with the Swedish administrative registers. Unlike other existing general-purpose network libraries, it emphasizes memory-efficient storage and fast querying for heterogeneous multi-mode network data.
Threadle is open source and built in .NET, with a Core module handling data models, processing, and file I/O for massive, multi-dimensional networks. A fast CLI console provides direct use or interop from other languages, including R. Precompiled binaries are available for Windows, macOS, and Linux, or it can be built from source.
Why Threadle?
Most existing tools, such as igraph and networkx , assume that edges are relatively homogeneous objects stored in adjacency lists or matrices. Whereas indeed perfectly fine for many datasets, these assumptions break down for real-world population-level data, where:
- relations are heterogeneous (different types of relations, different layers and social domains)
- data often includes both 1-mode and 2-mode relations
- networks include millions of nodes and/or billions of edges
- nodes could have various kinds and types of node attributes that needs efficient internal storage
- derived 2-mode (bipartite) projections take up huge amounts of memory
Purpose-built to handle the complexity found in full-population administrative registers, Threadle handles these aspects efficiently, storing edges and hyperedges directly with their nodes and enabling queries as if 2-mode networks were projected—without extra memory cost.
Key features
Native multilayer support
Each layer memory-optimized for type and directionality.
Distributed edge ownership
Edges and hyperedges are stored with the nodes that own them, improving memory locality and query performance.
Unified 1-mode and 2-mode architecture
Threadle stores:
- 1-mode (unipartite) edges (of layer-specific types)
- 2-mode (bipartite) relations as true hyperedges, not projections
This allows 2-mode data to be queried as if it were projected, without ever materializing the projection.
Built-in node attribute management
Fast addition, lookup and handling of node-owned attributes.
Efficient storage for population-scale administrative data
Designed for massive datasets with millions of nodes and billions of edges.
Clean separation of Core and frontend applications
- Core handles all modeling, processing, management and file handling
- A command-line interface (CLI) console with a full scripting language is provided, either for use as-is or for R interoperability
- Additional frontends can easily connect to Core, and Core can also be integrated into other systems needing Threadle capabilities.
What Threadle is (and Isn't)
Threadle is not an algorithm library: it does not try to replicate igraph or networkx in terms of existing methods and metrics. Instead, it provides:
- safe, exception-controlled APIs
- efficient data structures optimized for fast access and minimal memory footprint
- basic analytical primitives (density, degree, random alter/node, etc.)
- a foundation on which specialized or domain-specific algorithms can be built
This makes Threadle suitable for researchers and other practicioners who need to compile, preprocess and store large multilayer relational datasets, and then query this data for more domain-specific analytical heuristics and metrics.
"Threadle"?
Well, crocheting is fun, and Norrköping has a rich textile history - so... Yeah.