The IOTA blockchain has emerged as a pioneering distributed ledger technology tailored for the Internet of Things (IoT), offering a feeless, lightweight alternative to traditional chain-based systems. At its core lies the Tangle—a directed acyclic graph (DAG) structure that enables parallel transaction validation and decentralized consensus without energy-intensive mining. Despite growing interest in IOTA's applications, a rigorous theoretical understanding of how its underlying Tangle evolves over time has remained largely unexplored.
This article presents the first comprehensive stochastic model characterizing the evolution of the IOTA Tangle in real-world network conditions. By analyzing official ledger snapshots from the IOTA mainnet, we uncover that the degree distribution of the Tangle follows a double Pareto Lognormal (dPLN) distribution—an atypical pattern not captured by conventional power-law or exponential models. To accurately estimate parameters of this complex distribution, we introduce a custom Expectation-Maximization (EM) algorithm that outperforms generic optimization methods in both stability and fitting quality.
Our findings offer deep insights into the intrinsic dynamics of IOTA’s network, enabling more accurate simulations, improved protocol design, and enhanced security analysis.
Understanding IOTA and the Tangle Structure
IOTA diverges from classical blockchains by replacing the linear chain with a DAG-based ledger called the Tangle. Each vertex in the Tangle represents a message—either a value transaction or data payload—and each directed edge signifies an approval from one message to another.
Unlike proof-of-work blockchains, IOTA requires no mining. Instead, every node must validate two previous messages before submitting its own, fostering organic consensus growth. This mechanism encourages rapid confirmation of unapproved "tip" messages while maintaining decentralization.
👉 Discover how next-gen blockchain networks are reshaping IoT ecosystems
The distributed nature of IOTA means each node maintains a local copy of the Tangle. When new messages arrive in batches across different nodes, they are independently attached and later synchronized through propagation—a process known as tangle consolidation. This batch arrival behavior fundamentally alters the network’s growth dynamics compared to sequentially growing graphs.
Why Existing Network Models Fall Short
Traditional network models such as the Erdős–Rényi random graph or Barabási–Albert preferential attachment (PA) model assume either single-node sequential growth or attachment based solely on node degree ("rich-get-richer"). However, these assumptions fail to capture key features of IOTA’s Tangle:
- Batch Arrival Dynamics: Multiple nodes can simultaneously approve the same tip, leading to bursts of incoming edges.
- Complex Attachment Logic: Tip selection relies on subtangle analysis and probabilistic walks—not just degree values—making it incompatible with simple PA rules.
Empirical analysis reveals that neither power-law nor exponential distributions adequately fit observed degree data. Instead, we identify a double Pareto Lognormal (dPLN) distribution as the best descriptor—a finding supported by prior studies on file sizes, city populations, and mobile call graphs.
Stochastic Modeling of Tangle Growth
To model Tangle evolution, we adopt a two-component framework: batch attachment and state transition dynamics.
Batch Attachment Model
Messages arrive according to a multivariate Poisson process with rate λₜ and average batch size λₘ. Each new message selects s existing vertices (typically s=2) for approval, generating up to s·|Mₜ| new edges at time t. These attachments create variable-degree increments across nodes, necessitating a probabilistic treatment.
We define Mₜ as the set of incoming messages and partition resulting edges into subsets eₜ based on target vertices. This leads to non-uniform degree group growth, where multiple edges may attach to the same vertex within a single time window.
State Transition Analysis
Let Gₖ(t) denote the set of vertices with in-degree k at time t, and let sₖ(t) represent its size (degree group size). As new edges are added:
- “In”-Event: A vertex with degree < k receives enough approvals to reach degree k, increasing sₖ(t).
- “Out”-Event: A vertex with degree k gains additional approvals, moving to a higher degree group and decreasing sₖ(t).
These transitions resemble a stochastic diffusion process. We model the relative change in sₖ(t) using a stochastic differential equation (SDE):
$$ \frac{ds_k(t)}{s_k(t)} = \omega(t)dt + \sigma(t)dB(t) $$
where $ dB(t) $ is Brownian motion representing random fluctuations in attachment patterns.
Deriving the Double Pareto Lognormal Distribution
Solving the SDE under constant growth and volatility yields a geometric Brownian motion, implying that $ s_k(t) $ follows a lognormal (LN) distribution over time. However, since observation times are themselves exponentially distributed (due to variable message arrival rates), the compounded effect results in a double Pareto Lognormal (dPLN) distribution.
The dPLN probability density function is given by:
$$ f_{dPLN}(x) = \frac{\alpha\beta}{\alpha + \beta} \left[ x^{-\alpha-1} A(\alpha) \Phi\left(\frac{\log x - \mu - \alpha\sigma^2}{\sigma}\right) + x^{\beta-1} A(-\beta) \Phi^c\left(\frac{\log x - \mu + \beta\sigma^2}{\sigma}\right) \right] $$
where:
- $ A(z) = \exp(z\mu + z^2\sigma^2/2) $
- $ \Phi $ is the standard normal CDF
- $ \Phi^c $ is its complement
This distribution captures both heavy tails (via Pareto components) and central concentration (via lognormal core), making it ideal for modeling real-world Tangle data with heterogeneous connectivity.
Parameter Estimation Using an EM Algorithm
Standard gradient-based optimizers like BFGS struggle with dPLN parameter estimation due to non-convexity and boundary violations. To overcome this, we design a dedicated Expectation-Maximization (EM) algorithm leveraging the fact that the normal-Laplace (nLP) form of dPLN can be expressed as the sum of latent normal and skewed-Laplace variables.
EM Framework Overview
- E-Step: Compute expected values of latent variables using current parameter estimates.
- M-Step: Maximize the augmented likelihood function to update parameters in closed form.
This iterative approach avoids numerical instability and converges reliably even with poor initial guesses. Closed-form solutions for expectations and updates ensure computational efficiency.
Evaluation on Real-World IOTA Mainnet Data
We validate our model using 112 official tangle snapshots from the IOTA Foundation spanning 2016–2020. Key statistics include:
- Average tangle size: ~1.7 million messages
- In-degree distributions computed per snapshot
- Fitting evaluated via root mean squared logarithmic error (rMSLE)
Model Comparison Results
| Model | Parameters | Avg rMSLE |
|---|---|---|
| Power Law | 1 | 1.84 |
| Exponential | 1 | 1.63 |
| Lognormal | 2 | 0.55 |
| dPLN (Ours) | 4 | 0.19 |
The dPLN model achieves significantly lower rMSLE across all intervals:
- Header (k ∈ [1,2]): Best fit for low-degree nodes (~45% of total)
- Middle (k ∈ [3,5]): Accurately captures dominant clusters
- Rear (k ≥ 6): Effectively models rare high-degree hubs
Graphical comparisons confirm that only dPLN consistently aligns with empirical data across all regions.
👉 Explore cutting-edge research in blockchain network modeling
FAQ: Frequently Asked Questions
Q: Why is the dPLN distribution more suitable than power-law for IOTA?
A: Power-law assumes scale-free behavior where high-degree nodes dominate. In contrast, IOTA's tip selection promotes uniform validation, creating a mix of frequent low-degree nodes and rare high-degree ones—precisely what dPLN captures.
Q: Can this model predict future Tangle states?
A: Yes. With estimated parameters, one can simulate future degree distributions and assess scalability under varying load conditions.
Q: How does selfish behavior affect model accuracy?
A: While parasitic chain attacks introduce anomalies, they remain statistically rare. The dPLN model remains robust under moderate deviations from ideal attachment behavior.
Q: Is the EM algorithm publicly available?
A: Yes, our implementation is open-sourced to support further research and validation.
Implications for Future Research and Development
Our theoretical model opens several avenues:
- Designing efficient Tangle simulators without full protocol emulation
- Informing consensus protocol upgrades in IOTA 2.0+
- Enhancing anomaly detection for malicious attachment patterns
- Enabling performance benchmarking under realistic network dynamics
Moreover, the success of the EM-based estimator highlights the need for domain-specific tools in blockchain analytics—generic solvers often fall short when dealing with complex, multi-modal distributions.
Conclusion
We present the first theoretical characterization of Tangle evolution in the IOTA blockchain using stochastic analysis. By identifying the double Pareto Lognormal (dPLN) distribution as the governing law of degree growth and developing a robust EM-based parameter estimation algorithm, we provide a foundational framework for understanding IOTA’s network dynamics.
This work bridges empirical observations with formal modeling, offering valuable tools for researchers and developers working at the intersection of distributed ledgers and IoT infrastructure.
👉 Stay ahead in blockchain innovation—learn from groundbreaking network analyses
Core Keywords:
- IOTA blockchain
- Tangle evolution
- Directed acyclic graph (DAG)
- Double Pareto Lognormal (dPLN)
- Stochastic differential equation (SDE)
- Expectation-Maximization (EM) algorithm
- Network dynamics
- Parameter estimation