Overview

Threadle: Population-scale network data management

Threadle is a high-performance software system for large, complex, multilayer and multimode network datasets, designed explicitly for working with the Swedish administrative registers. Unlike other existing general-purpose network libraries, it emphasizes memory-efficient storage and fast querying for heterogeneous multi-mode network data.

Threadle is open source and built in .NET, with a Core module handling data models, processing, and file I/O for massive, multi-dimensional networks. A CLI console provides direct use or interop from other languages, specifically R. Precompiled binaries are available for Windows, macOS, and Linux, or it can be built from source. An R library - threadleR - is available for R interoperability.

Why Threadle?

Most existing tools, such as igraph , networkx , and NetworKit , assume that edges are relatively homogeneous objects stored in adjacency lists or matrices. Whereas indeed perfectly fine for many datasets, these assumptions break down for real-world population-level data, where:

relations are heterogeneous (different types of relations, different layers and social domains)
data often includes both 1-mode and 2-mode relations
networks include millions of nodes and/or billions of edges
nodes could have various kinds and types of node attributes that needs efficient internal storage
derived 2-mode (bipartite) projections take up huge amounts of memory

Purpose-built to handle the complexity found in full-population administrative registers, Threadle handles these aspects efficiently, storing edges and hyperedges directly with their nodes and enabling queries as if 2-mode networks were projected — without extra memory cost for projecting such data.

Key features

Native multilayer support

Each layer memory-optimized for type and directionality.

Distributed edge ownership

Edges and hyperedges are stored with the nodes that own them, improving memory locality and query performance.

Unified 1-mode and 2-mode architecture

Threadle stores:

1-mode (unipartite) edges (of layer-specific types)
2-mode (bipartite) relations as true hyperedges, not projections

This allows 2-mode data to be queried as if it were projected, without ever materializing the projection.

Built-in node attribute management

Fast addition, lookup and handling of node-owned attributes.

Efficient storage for population-scale administrative data

Designed for massive datasets with millions of nodes and billions of edges.

Clean separation of Core and frontend applications

Core handles all modeling, processing, management and file handling
A command-line interface (CLI) console with a full scripting language is provided, either for use as-is or from R using threadleR.
Threadle.CLIconsole runs in either text-mode to make us humans happy, or in JSON-mode for easy interoperability with threadleR. Additional frontends can thus easily connect to Core, through CLIconsole running in JSON mode, and Core can also be integrated (as a DLL) into other systems needing Threadle capabilities.

What Threadle is (and Isn't)

Threadle is not an algorithm library: it does not try to replicate igraph or networkx in terms of existing methods and metrics. Instead, it provides:

A crocheted space invader on a yarn ball.

safe, exception-controlled APIs
efficient data structures optimized for fast access and minimal memory footprint
basic analytical primitives (density, degree, random alter/node, etc.)
a foundation on which specialized or domain-specific algorithms can be built

This makes Threadle suitable for researchers and other practicioners who need to compile, preprocess and store large multilayer relational datasets, and then query this data for more domain-specific analytical heuristics and metrics involving sampling and traversals.

"Threadle"?

Well, crocheting is fun, and Norrköping has a rich textile history - so... Yeah.