Deepak Gupta

By Deepak GuptaFirst published May 24, 2026Updated May 25, 2026Blockchain

Blockchain Fundamentals: Hash Functions in Distributed Ledgers

How hash functions make distributed ledgers possible: block identity, Merkle trees, proof-of-work, double-spend prevention, and quantum-era migration.

Strip blockchain down to its primitives and you find a hash function doing most of the work. Block identity, chain linkage, Merkle proofs, proof-of-work, transaction signatures, double-spend prevention. Remove SHA-256 from Bitcoin and there is no Bitcoin. Remove Keccak from Ethereum and there is no Ethereum. Hash functions are the load-bearing wall of every distributed ledger.

This guide walks through the five jobs hash functions do inside a blockchain, with working code, and ends on the two questions that matter most for new designs in 2026: which algorithm, and what is the quantum migration plan.

What hash functions actually do in a blockchain

Five distinct jobs. Each one fails differently when the underlying hash function breaks.

Block identity. Every block has a hash that uniquely identifies it. Change one byte of block contents, you get a completely different hash. This is why "immutable" is not marketing language: changing history would require recomputing every subsequent hash on the chain.
Chain linkage. Each block contains the hash of the previous block. That single field is the entire chain. Tamper with block 100, every block from 101 to the tip becomes invalid.
Merkle commitments. Transactions inside a block are summarised by a Merkle root. This lets a light client verify that a transaction is in a block by downloading log(n) hashes instead of every transaction.
Proof of work. Mining is the brute-force search for a nonce that makes the block hash start with some number of leading zeros. The work is purely computational; the difficulty is set by the protocol.
Signature digests. Transactions are signed over their hash, not their raw bytes. The signature scheme (ECDSA, Ed25519, Schnorr) operates on a fixed-size digest produced by the hash function.

For the cryptographic-properties primer that underwrites all five jobs, see All about hashing algorithms. For the algorithm-selection lens, see Choosing the right hash algorithm: a decision framework.

The properties blockchain relies on

Blockchain leans hardest on three of the standard cryptographic hash properties.

Collision resistance. The most load-bearing property. If two different transactions could hash to the same digest, signatures over that digest would be ambiguous, double-spends would become trivially constructable, and Merkle proofs would lie. SHA-1 fell here in 2017, which is why no serious chain uses it.
Preimage resistance. An attacker who sees a block hash should not be able to find a block contents that produces it. In a proof-of-work context, this is what makes mining genuinely hard instead of merely searchable.
Avalanche effect. Flipping one bit of input flips roughly half the output bits. This is what makes brute-force mining computationally meaningful: an attacker cannot incrementally improve a near-match nonce; each nonce is independent.

SHA-256: the Bitcoin canon

SHA-256 is the most-used hash function in the cryptocurrency space. Bitcoin uses it twice (SHA-256(SHA-256(x))) for block headers, transaction IDs, and the Merkle tree. The reasons are historical (it was the gold standard in 2008 when Bitcoin shipped) and structural (it remains unbroken, is hardware-accelerated, and has decades of cryptanalysis behind it).

from hashlib import sha256

def block_hash(block_data: str) -> str:
    """SHA-256 hash of serialised block data."""
    return sha256(block_data.encode()).hexdigest()

The full case for SHA-256 across output sizes lives in SHA-2 family: SHA-256 vs SHA-384 vs SHA-512.

Why not MD5 or SHA-1

Both are broken for collision resistance. A blockchain built on MD5 would be trivially forgeable; one built on SHA-1 is forgeable for $50,000 of cloud compute (the Google SHAttered attack, scaled). Background reading: MD5: uses and vulnerabilities and SHA-1 legacy migration.

Why Ethereum uses Keccak

Ethereum uses Keccak-256 (the original Keccak submission, slightly different from the standardised SHA-3-256). The choice was a 2014 hedge against future SHA-2 weaknesses. The two are functionally interchangeable for blockchain purposes; the lesson is that each chain inherits the cryptographic choices of its founders. SHA-3 / Keccak deep-dive.

Merkle trees, the second key primitive

A Merkle tree hashes pairs of values bottom-up until a single root remains. Two practical superpowers fall out:

Light verification. A node can prove a transaction is in a block by sending log(n) sibling hashes (a Merkle proof), not the full block.
Tamper localisation. If the root changes, you know something in the tree changed. You can binary-search the tree to find which leaf differs.

from hashlib import sha256

def merkle_root(items: list[str]) -> str:
    if not items:
        return sha256(b"").hexdigest()
    layer = [sha256(x.encode()).hexdigest() for x in items]
    while len(layer) > 1:
        if len(layer) % 2 == 1:
            layer.append(layer[-1])  # duplicate last on odd count (Bitcoin convention)
        layer = [
            sha256((a + b).encode()).hexdigest()
            for a, b in zip(layer[::2], layer[1::2])
        ]
    return layer[0]

Outside blockchain, Merkle trees also power Git, Cassandra anti-entropy, BitTorrent, and content-addressable storage (IPFS).

A minimal block structure

The smallest useful block carries an index, the previous-block hash, a Merkle root over its transactions, a timestamp, a nonce for proof-of-work, and its own hash.

from dataclasses import dataclass, field
from hashlib import sha256
from time import time

@dataclass
class Block:
    index: int
    transactions: list[str]
    previous_hash: str
    timestamp: float = field(default_factory=time)
    nonce: int = 0
    merkle_root: str = field(init=False)
    hash: str = field(init=False)

    def __post_init__(self):
        self.merkle_root = merkle_root(self.transactions)
        self.hash = self.compute_hash()

    def compute_hash(self) -> str:
        header = f"{self.index}|{self.merkle_root}|{self.timestamp}|{self.previous_hash}|{self.nonce}"
        return sha256(header.encode()).hexdigest()

The header serialisation is deliberate: pipe-separated, sorted-key, no JSON ambiguity. Real chains use compact binary serialisations (Bitcoin's block header is exactly 80 bytes) for the same determinism reason.

Proof-of-work mining

Mining is a brute-force search for a nonce that drops the block hash below a difficulty target. Bitcoin expresses this as "n leading zero bits"; the implementation is the same.

def mine(block: Block, difficulty: int) -> Block:
    """Search for a nonce whose hash has `difficulty` leading hex zeros."""
    target = "0" * difficulty
    while not block.hash.startswith(target):
        block.nonce += 1
        block.hash = block.compute_hash()
    return block

The economic genius of proof-of-work is that the work is asymmetric: hard to produce, trivial to verify. Anyone can check a mined block in microseconds; producing it took the network's entire collective compute. That asymmetry is the foundation of Nakamoto consensus.

Double-spend prevention and chain validation

The hash chain itself is the double-spend defence. A transaction appears in exactly one block; that block has a hash; the next block commits to that hash; the chain advances. To double-spend, an attacker would have to rewrite the chain from the conflicting block forward, which means out-mining the rest of the network from that point.

def verify_chain(chain: list[Block]) -> bool:
    for i in range(1, len(chain)):
        current, previous = chain[i], chain[i - 1]
        if current.previous_hash != previous.hash:
            return False  # broken linkage
        if current.hash != current.compute_hash():
            return False  # broken self-hash
        if current.merkle_root != merkle_root(current.transactions):
            return False  # broken merkle commitment
    return True

For the broader integrity-verification pattern (checksums, signed manifests, content addressing), see Data integrity verification.

Performance engineering

The dominant performance choices in production chains are about how the hash is invoked, not which hash to invoke.

Hardware acceleration

SHA-256 has been an Intel/AMD CPU intrinsic (SHA-NI) since 2017 and an ARM instruction since 2013. On hardware with the extension, hashing throughput is 3-5x faster than the software path. Use a library that detects and uses the intrinsic automatically: openssl, libsodium, or the Rust `sha2` crate.

Parallel Merkle construction

The Merkle tree's bottom layer can be hashed entirely in parallel. Each pair on the second layer depends only on its two children. Multi-threaded Merkle construction is straightforward and worth the engineering on chains with large blocks.

from concurrent.futures import ProcessPoolExecutor

def parallel_leaf_hashes(items: list[str]) -> list[str]:
    with ProcessPoolExecutor() as ex:
        return list(ex.map(lambda x: sha256(x.encode()).hexdigest(), items))

Caching

Transaction hashes rarely change once a transaction is signed. Cache them at the mempool layer; never recompute under the consensus path. For a long-running process, an LRU around the transaction-hash function is usually worth it.

Alternative hash choices for non-Bitcoin chains

BLAKE3 is 5-10x faster than SHA-256 on modern hardware, parallelises across cores by design, and is incremental. Several newer chains (and storage layers under chains) use it. The trade-off is ecosystem support: every existing block explorer, wallet, and SDK assumes SHA-256 or Keccak. BLAKE2 and BLAKE3.

Quantum and the decade ahead

The quantum threat to hash functions is real but narrow. Grover's algorithm gives a quadratic speedup for preimage search, which effectively halves the security margin of a 256-bit hash to 128 bits. Still safe by modern standards. The collision-resistance side (which is what most blockchain operations rely on) is far less affected.

The bigger quantum risk for blockchains is not the hash, it is the signature scheme. ECDSA over secp256k1 (Bitcoin) and Ed25519 (most newer chains) are both broken by sufficiently large quantum computers running Shor's algorithm. The migration path is post-quantum signatures: SPHINCS+ (hash-based, ironically), Dilithium and Falcon (lattice-based), NIST-standardised in 2024.

What a 2026 chain designer should plan for:

Keep using SHA-256 (or upgrade to SHA-384 for new long-lived chains).
Treat the signature scheme as the migration risk, not the hash function.
Build algorithm-agility into the wire format: every signature carries an algorithm identifier, transitions are protocol-version events.

Long-form treatment: The future of hashing: quantum resistance and beyond.

Scalability with hashes

Hashes also do most of the work in modern scalability designs.

Consistent hashing for sharding. Map transactions to shards by hash modulo shard count; expand shard count using the consistent-hashing ring to minimise rehashing.
State channels and rollups. Compress N off-chain transactions into one on-chain Merkle proof. Optimistic rollups (Arbitrum, Optimism) and ZK rollups (zkSync, StarkNet) both lean on hash trees to commit to state.
Hash-based payment channels. Lightning Network's HTLCs (hashed time-locked contracts) gate payments on revealing a preimage. Pure hash function plumbing.

def shard_for(transaction_hash: str, num_shards: int) -> int:
    return int(transaction_hash, 16) % num_shards

Key takeaways

Hash functions are the load-bearing wall of every distributed ledger. Get the choice and the implementation right.
SHA-256 remains the canonical choice; SHA-384 for long-lived new chains; Keccak when the ecosystem uses it.
Merkle trees turn O(n) verification into O(log n). They are the second key primitive after the hash function itself.
Performance lives in hardware intrinsics, parallel Merkle construction, and aggressive caching of immutable hashes.
The 2026 migration risk is the signature scheme (Shor), not the hash function (Grover). Plan algorithm-agility.

FAQ

Why does Bitcoin double-hash with SHA-256?

The double SHA-256 ("SHA-256d") was Satoshi's belt-and-braces defence against length-extension attacks on the underlying Merkle-Damgard construction. Modern SHA-256 implementations are not vulnerable in practice, but the double-hash is baked into the protocol and removing it would be a hard fork.

Could quantum computers break Bitcoin tomorrow?

No. The quantum threat is gradual. Practical cryptographically-relevant quantum computers are estimated 10-15 years out, and even then the signature scheme (ECDSA) breaks before the hash (SHA-256). The realistic risk window is the post-quantum signature migration, not a sudden hash collapse.

What's the right hash function for a new blockchain in 2026?

SHA-256 if you want ecosystem compatibility (wallets, block explorers, SDKs). BLAKE3 if you want raw performance and control your own ecosystem. SHA-3 / Keccak if you are interoperable with Ethereum-style tooling. Avoid MD5 and SHA-1 entirely.

Are Merkle trees only for blockchain?

No. Git is essentially a Merkle DAG (every commit is a hash of its tree plus parents). Cassandra and DynamoDB use them for anti-entropy. BitTorrent uses them for piece verification. IPFS is built around Merkle-linked content addressing. The pattern is older than Bitcoin.

Why is SHA-256 considered safe when it is so widely used?

Widely used is exactly why. SHA-256 has been under continuous cryptanalysis since 2002, with billions of dollars of Bitcoin mining incentives to find a weakness. None has been found. The combination of standardisation, hardware acceleration, and a 20-year clean track record is what makes it the safe default in 2026.

How do I keep my chain design quantum-resistant without overengineering?

Two practical moves. First, use SHA-384 instead of SHA-256 for any new chain expected to operate past 2035. Second, design your signature scheme as a versioned field so you can ship a post-quantum signature upgrade as a protocol-version event without rewriting the chain. Hash agility is cheap; signature agility is the one that matters.

Get the newsletter

New writing on identity, AI security, and building software, delivered when it ships. No tracking pixels, no funnels, unsubscribe with one click.