Love and Suffering of Archive Nodes

Matthias, so Known as “Ghostie”, is Knowledgeable About Full Nodes, including that strain hard drives, cpus, and memory. In A Guest Post, He Shares his Experiences.

By ghosty

Operating to Archive Node for Blockchains Like Ethereum or Polygon is laboratory-intensive and costly. It’s more or less the Pinnacle of node setups. The node requires good hardware, maintenance (good monitoring), and Time – at Least Whever Monitoring Kicks in or a new release is due. Typicalally, one Only Takes This on for a Specific Reason.

baner

To archive node is more than a full node. A full node “only” store the enttire blockchain, where an archive node additionally stores every state that each smart contract and transaction ever had (i i.E., at Every Block Height). However, The Additional Data Of An Archive Node can be reconstructed from the data of a full node.

Since this often leads to discussions on Twitter or online forums, I like to bring an analogy to Bitcoin: An archive node for ethereum is akin to a bitcoin node that store the utxo (unspent transaction output) set Valid at every block height. This is not Particularly Interesting for Bitcoin Because (a) thesis Data Can Be Reconstructed Much Faster and (B) They Do Not Hold Any Special Information. It’s Different with Ethereum Because there are Smart Contracts Whose Intermediate States Are Stored, Making It Possible to Directly query, for Example, How Much Usdt an Account Had At Any Given Time (Block Height).

baner

I synchronized my first archive node for the Ethereum Ecosystem in 2019. It was Out of Pure Curiosity: I had Purchased A Lot of Server Hardware for a Project (96 Cores, 768 Gib RAM, Over 40 Tib SSD Storage) and Wanted to Push it to the limit. So, i synchronized to archive node for ethereum using geth, which took exactly 7 days and 22 hours. I still Remember it so precisely Because I discussed the topic on coinforum.de and documented my progress.

Long-Term, Better Reasons Are Needed. That’s a Significant Difference Between Ethereum and Bitcoin. Bitcoiners Synchronize The Entire Blockchain Out of Idealism. In Ethereum, Most Users Utilize Node Providers Like Infura or Alchemy. Thesis Providers Allow Roughly 100,000 Free API Calls, Suffed for Smaller Projects and Wallets.

baner

When and for whom is an archive node useful?

It becomes critical when one Wants or Needs to Conduct Blockchain Analyses. If, for instance, you want to know what accounts hero what amounts of a token on ethereum at any time, you need the blockchain block by block to find all addresses and then query the historical balances for each erc20 transaction. This Quickly Exceeds the Free Api Calls Provided by Alchemy and Infura and Can Become Expensive Fast. In Such Cases, to Archive node is Worthwhile Becausen Scanning All Blocks Leads to as many api calls as there are blocks in the blockchain. Ethereum Currently has over 20 million blocks.

baner

I Now Operate Search Nodes for Clients. The Reasons Can Sometimes Be Absurd. For instance, a company offer assistance in complying with EU regulations, which Require demonstrating the co2 consumption for each crypto asset. If you have a token on Ethereum, you must trace back all past transactions and swaps to calculate the gas used by the tokens. The Gas Corresponds to the Required Computational Power and Thus Serves as an indicator of CO2 Consumption.

This is, of Course, Ludicrous. AS A Proof-of-Stake Blockchain, Ethereum Consumes Very Little Energy, and Synchronizing An Archive Node Likely Generates Far More Co2 Than You Could Ever Save Through The Regulation (if at all). But it’s not my job to Ask Search Questions, But Rather to Set Up Nodes.

baner

What resources are needed to operates to Ethereum Archive Node?

The differences between nodes and clients can be substantial. I have experience with Ethereum, Polygon, and Solana.

Using the NetherMind Client, Ethereum Requires About 15 Terabytes of Storage for the Execution Layer Data, 8 CPU CORES (Preferably 16 for Synchronization), and 128 Gigabytes of RAM. An alternative client is Erigon, which Requires only Three Terabytes of Data. Erigon stores data differly Than Nethermind, Using a flat architecture that only saves deltas between blocks. This may result in Slightly Longer Query Times for Historical Data But Saves A Lot of Space.

baner

For a server, the Demand is moderate. Once Synchronized, Less is Needed. The biggest challenge is the hard drives, or rather the ssds. To understand why, you must know how smart contracts work with the Ethereum virtual machine (EVM). I’ll Keep it letter: Each Smart Contract Has Its Own Virtual Memory Space, which May load Different Parts Depending on the Account. This leads to Numerous Read and Write Operations (I/O) With Each New Block and During Synchronization.

Therefore, Traditional Hard Drives (HDDS) are outdated; SSDS are needed. Users of NetherMind Must Chain Several SSDS Together to Store at Least 15 Terabytes. This can Quickly Cost a Four-Figure Amount IF Done with AWS. With erigon, you could theoretical synchronize to archive node on a high-performance pc.

baner

Polygon and Solana

And Ethereum is Still Relatively Resource Efficient! Other EvM-Compatible Blockchains Like Polygon Or non-Evm-Compatible Ones Like Solana Face Similar Problems: To Synchronize, State After State Must Be Calculated and Stored, And Every Smart Contract Quires Numerous I/O Operations.

Imagine what happens if you shorten block times as with polygon: Even better hard drives are needed to handle the requests. While 6,000 IOPS Suffice for Ethereum, 20,000 IOPS ARE Recommended for Synchronizing with Erigon on Polygon. This Further Drives Up Costs, Likely Amounting to 8,000 euros a month if adhering to the system requirements for an archive node.

baner

Erigon is typicalally used as the client for polygon archive nodes. While it needs only 3 terabytes of storage for Ethereum, it requires 10 terabytes for polygon. And polygon is significantly Younger Than Ethereum. So one can already project how resource consumption will look in the future and how much storage a polygon archive node would hypotheticalally need a different client.

Solana is Even more Extreme. I once Operated Two Nodes for Mainnet and testnet to Participate in the Solana Grant. Thesis Were not Archive Nodes Butes But Validators. Even Validators Require A CPU With at Least 12 Cores And at Least 128 Give Ram for the testnet and 256 GEF RAM for the Mainnet (For Better Context: At Last one Solana Outage was Caused Because Too Many Validators Had “ONLY” 128 GEF RAM). The network Connection Should So Be good, with system requirements calling for at least 1 gbit/s synchronous bandwidth (Two validators bare a constant load of About 300 Mbit/s), with a monthly traffic volume of around 100 tib per validator. Search infrastructure is hard to find in many places in Germany.

baner

A Regular Solana Node Already Needs 2 Terabytes, and An Archive Node According to Online Sources would need about 100 Terabytes. The Costs Are Enormous.

Can this work out in the long run?

There Are Doubts About Whether This Can Work Long Term. Personally, I Believe in Technological Progress. For instance, when I synchronized my geth node in 2019, it took a good week. Today, it Takes Hardly Any Longer with more powerful hardware, Despite the Blockchain Being Much Larger. This is due not only to Better hardware but so to impovered algorithms, search as the block processing speed in nethermind v1.26, which is 30 to 50 percent Faster.

baner

Hardware and Software Are Becoming Increasingly Powerful, Particularly for Servers.
However, Even Today, Investing Hypothetically 250,000 euros is Enough to Operate to Archive Node for Solana for the Next Ten Years. As long as there’s a Reason and a Business Model, someone will do it. So i’m not worried that archive nodes will cease to exist.

It’s not Enough, However, to Simply Invest More Money Into Hardware for Scaling. Once a blockchain becomes widespread, it hits scalability limits on its own. Without many accounts, many projects Easily Achiev a Five or Six-Figure Transaction via Second rate in synthetic benchmarks. This is Because Accounts and Current States Easily Fit in the CPU Cache, and Benchmarks ARE OFTEN OPTILIZED for Parallelism for Marketing Purposes.

baner

But with more users, Like Bitcoin, Things Get Tight. Even here, with only 7 transactions per second, it becomes critical at 4 gigabytes of ram Because the utxo set is so large. This state (or “state”) is Even Larger and more complex in other blockchains. That’s Why One Shildn’t Believe the Promises of Blockchain Developers from the Lab. It Gets more challenging in practice.

Moreover, there’s a natural barrier: latency. Data Traffic Require Time to Travel From One Node to Another, Defined by the Speed ​​of Light As a Physical Limit. A Research Paper (Information Propagation In The Bitcoin Network) Conclud that a block interval of at least 12.6 Seconds is request for a block to propagate to most nodes worldwide. This is where Ethereum’s Block Interval Comes from. More Modern Blockchains Like Solana Have Much Shorter Intervals, Leading Nodes to Concentrate Mainly in Europe, The USA, and Data Centers.

baner

This Brings Us Back To The Old Question of Decentralization and Scalability. With only a few nodes close together, scaling is Easier. But you don’t have to be a bitcoin maximalist to wish for full and archive nodes to be spread around the world.

And no one knows ..

Recently, I Synchronized Another Ethereum (Nethermind) and a Polygon (Erigon) Archive Node. Both Clients had bugs preventing synchronization as an archive node. While a bug sneaked into NetherMind Due to the mentioned optimization, which was quickly fixed after a letter apology (“well, we don’t optimize for running as an archive node”), The bug in erigon remained open, even though a disveloper was assigned to it three. It likely isn’t consby very important.

baner

The Bugs Were Caused by Upgrades Altering Block Processing. They aren’t notized when an archive node is running, nor when synchronizing a regular node. It’s only when you go through the trouble of synchronizing a new archive node that you notice it, sometimes after Weeks, depending on the server.

The fact that thesis bugs existed for Weeks shows how rarely someeone synchronizes a new archive node node. This may lead to search errors going unnoticed for a long time, potential rendering software synchronization for new archive nodes impossible without anyone realizing it – so so rarely company.

baner

In My Case, Things Worked Out Well. I Reported the Issue to NetherMind and Helped Developers Fix it. The error was already known for erigon but marked for a future version when it occurred with me me. I’m now using to Older Version – The Impact is Less Significant for Polygon. However, if Such a bug Digs Deeper Because Archive Nodes are synchronized even free frequently and the ci/cd of node development isn’t improved, it might not be as easy to fix.

So, it remains possible – but a challenge.

baner

Leave a Reply

Your email address will not be published. Required fields are marked *