Checkpoint Sync Safety
By Adrian Sutton
Apart from being awesomely fast, checkpoint sync also exists to ensure that you can safely sync despite the limitations of weak subjectivity. The initial state you use is considered trusted - you are telling your beacon node that this state is the canonical chain and it should ignore all others. So it’s important to ensure you get the right state.
Get It From Somewhere You Trust
The simplest and best way to ensure the state is right is to get it from somewhere trusted. There are a few options.
The best source is one of your own nodes. For setups that run multiple nodes that’s easy as they can get an initial state for a new node from any of their existing nodes. Even people running single nodes may be able to use a state from their own node in some cases. For example if you need to re-sync your node or are switching clients you can store the current finalized state from your node before stopping it, then use that as the initial state for your new sync. In both these cases the solution is completely trustless - your using data from your own node so no need to trust anyone else and no need for further verification.
If you can’t get the state from your own node you’ll have to get it from someone else. A friend or family member you trust that runs their own node would be an excellent source. This isn’t trustless, but will usually still have a very high level of trust even without any further verification.
Otherwise you’ll have to get the state from a public provider. Currently that’s just Infura, but ideally more options will be available in the future. We’re beyond personal trust circles but Infura is certainly still a very reputable provider so most people would still have a reasonable level of trust in them.
The final option is to get the state from some random person on the internet. Seems crazy and is definitely not something to be trusted, but it is still an option if you verify the state against more trusted sources. Early on, before Infura supported the API to download the state this was actually the most common way people used checkpoint sync. I would just periodically put a state up in a GitHub repo so they could access it.
Verify The State
If you get the state from a source you don’t fully trust, you’ll need to verify it. You can do this by calculating the hash tree root of the state, then checking that against one or more block explorers. In essence you’re aggregating your trust from multiple services until you (hopefully) reach a level you’re comfortable with.
The main problem with this is that you’ll need a tool to calculate the hash tree root of the state itself. It’s simpler to just use the state to sync your node then confirm that the block roots your node reports match block explorers. If you wind up at the right chain head, you must have started from a canonical state. You will likely want to disable any validators you run until you’ve verified the blocks - otherwise you may attest to something you don’t actually trust.
Why Can’t The Beacon Node Do It For Me?
There have been some proposals for beacon nodes to automatically verify the state using heuristics like whether the state matches what the majority of your peers have. I’m personally very skeptical of such ideas because if you could trust the information from the network, you wouldn’t need checkpoint sync in the first place. The fundamental challenge introduced by weak subjectivity is that your node simply can’t determine what the canonical chain is (if it’s been offline for too long) and so has to be told. Your node’s peers are essentially the adversary we’re trying to defend against with checkpoint sync so we want to avoid using any information from them to second guess the initial state.
The beacon node could automate the process of checking against multiple block explorers but there are two problems with that.
Firstly, there isn’t an agreed API to perform that check so we’d need to either design that and get block explorers to support it or write custom code in clients to support each block explorer.
Secondly, and more problematically, client developers would have to decide which block explorers are trust worthy and embed the list into their clients. There’s already a lot of responsibility in being a client dev and a lot of trust the community puts in us - we really don’t want to expand that by also being responsible for deciding which services users should trust. There could just be a config option so users could specify their own list of explorers to verify against but that’s a pretty clunky UX and it’s very unlikely users would go to the effort of finding suitable URLs and specifying them. Besides, it’s probably more work for them than just verifying the block roots manually.