Lotus is an implementation of the Filecoin Distributed Storage Network. A Lotus node syncs blockchains that follow the Filecoin protocol, validating the blocks and state transitions. The specification for the Filecoin protocol can be found here.
For information on how to setup and operate a Lotus node, please follow the instructions here.
At a high level, a Lotus node comprises the following components:
FIXME: No mention of block production here, cross-reference with schomatis's miner doc
We discuss some key Filecoin concepts here, aiming to explain them by contrasting them with analogous concepts in other well-known blockchains like Ethereum. We only provide brief descriptions here; elaboration can be found in the spec.
Unlike in Ethereum, a block can have multiple parents in Filecoin. We thus refer to the parent set of a block, instead of a single parent. A tipset is any set of blocks that share the same parent set.
There is no concept of "block difficulty" in Filecoin. Instead, the weight of a tipset is simply the number of blocks in the chain that ends in that tipset. Note that a longer chain can have less weight than a shorter chain with more blocks per tipset.
We also allow for "null" tipsets, which include zero blocks. This allows miners to "skip" a round, and build on top of an imaginary empty tipset if they want to.
We call the heaviest tipset in a chain the "head" of the chain.
An Actor is analogous to a smart contract in Ethereum. Filecoin does not allow users to define their own actors, but comes with several builtin actors, which can be thought of as pre-compiled contracts.
A Message is analogous to transactions in Ethereum.
Sync refers to the process by which a Lotus node synchronizes to the heaviest chain being advertised by its peers. At a high-level, Lotus syncs in a manner similar to most other blockchains; a Lotus node listens to the various chains its peers claim to be at, picks the heaviest one, requests the blocks in the chosen chain, and validates each block in that chain, running all state transitions along the way.
We now discuss the various stages of the sync process.
When a Lotus node connects to a new peer, we exchange the head of our chain with the new peer through the
hello protocol. If the peer's head is heavier than ours, we try to sync to it. Note that we do NOT update our chain head at this stage.
Note: The API refers to these stages as
We proceed in the sync process by requesting block headers from the peer, moving back from their head, until we reach a tipset that we have in common (such a common tipset must exist, thought it may simply be the genesis block). The functionality can be found in
If the common tipset is our head, we treat the sync as a "fast-forward", else we must drop part of our chain to connect to the peer's head (referred to as "forking").
FIXME: This next para might be best replaced with a link to the validation doc Some of the possible causes of failure in this stage include:
Note: The API refers to this stage as
Having acquired the headers and found a common tipset, we then move forward, requesting the full blocks, including the messages.
For each block, we first confirm the syntactic validity of the block (SPECK-CHECK), which includes the syntactic validity of messages included in the block. We then apply the messages, running all the state transitions, and compare the state root we calculate with the provided state root.
FIXME: This next para might be best replaced with a link to the validation doc Some of the possible causes of failure in this stage include:
The core functionality can be found in
Syncer::checkBlockMessages() performing syntactic validation of messages.
Note: The API refers to this stage as
If all validations pass we will now set that head as our heaviest tipset in
ChainStore. We already have the full state, since we calculated it during the sync process.
FIXME (aayush) I don't fuilly understand the next 2 paragraphs, but it seems important. Confirm and polish. Relevant issue in IPFS: https://github.com/ipfs/ipfs-docs/issues/264
It is important to note at this point that similar to the IPFS architecture of addressing by content and not by location/address (FIXME: check and link to IPFS docs) the "actual" chain stored in the node repo is relative to which CID we look for. We always have stored a series of Filecoin blocks pointing to other blocks, each a potential chain in itself by following its parent's reference, and its parent's parent, and so on up to the genesis block. (FIXME: We need a diagram here, one of the Filecoin blog entries might have something similar to what we are describing here.) It only depends on where (location) do we start to look for. The only address/location reference we hold of the chain, a relative reference, is the
heaviest pointer. This is reflected by the fact that we don't store it in the
Blockstore by a fixed, absolute, CID that reflects its contents, as this will change each time we sync to a new head (FIXME: link to the immutability IPFS doc that I need to write).
FIXME: Create a further reading appendix, move this next para to it, along with other extraneous content This is one of the few items we store in
Datastore by key, location, allowing its contents to change on every sync. This is reflected in the
(*ChainStore) writeHead() function (called by
takeHeaviestTipSet() above) where we reference the pointer by the explicit
chainHeadKey address (the string
"head", not a hash embedded in a CID), and similarly in
(*ChainStore).Load() when we start the node and create the
ChainStore. Compare this to a Filecoin block or message which are immutable, stored in the
Blockstore by CID, once created they never change.
A Lotus node also listens for new blocks broadcast by its peers over the
gossipsub channel (see FIXME for more). If we have validated such a block's parent tipset, and adding it to our tipset at its height would lead to a heavier head, then we validate and add this block. The validation described is identical to that invoked during the sync process (indeed, it's the same codepath).
In Filecoin, the chain state at any given point is a collection of data stored under a root CID encapsulated in the
StateTree, and accessed through the
StateManager. The state at the chain's head is thus easily tracked and updated in a state root CID. (FIXME: Talk about CIDs somewhere, we might want to explain some of the modify/flush/update-root mechanism here.))
Recall that a tipset is a set of blocks that have identical parents (that is, that are built on top of the same tipset). The genesis tipset comprises the genesis block(s), and has some state corresponding to it.
StateManager are responsible for computing the state that results from applying a tipset. This involves applying all the messages included in the tipset, and performing implicit operations like awarding block rewards.
Any valid block built on top of a tipset
ts should have its Parent State Root equal to the result of calculating the tipset state of
ts. Note that this means that all blocks in a tipset must have the same Parent State Root (which is to be expected, since they have the same parent tipset)
StateManager::computeTipsetState() is called with a tipset,
ts, it retrieves the parent state root of the blocks in
ts. It also creates a list of
BlockMessages, which wraps the BLS and SecP messages in a block along with the miner that produced the block.
Control then flows to
StateManager::ApplyBlocks(), which builds a VM to apply the messages given to it. The VM is initialized with the parent state root of the blocks in
ts. We apply the blocks in
ts in order (see FIXME for ordering of blocks in a tipset).
For each block, we prepare to apply the ordered messages (first BLS, then SecP). Before applying a message, we check if we have already applied a message with that CID within the scope of this method. If so, we simply skip that message; this is how duplicate messages included in the same tipset are skipped (with only the miner of the "first" block to include the message getting the reward). For the actual process of message application, see FIXME (need an internal link here), for now we simply assume that the outcome of the VM applying a message is either an error, or a
MessageReceipt and some other information.
We treat an error from the VM as a showstopper; there is no recovery, and no meaningful state can be computed for
ts. Given a successful receipt, we add the rewards and penalties to what the miner has earned so far. Once all the messages included in a block have been applied (or skipped if they're a duplicate), we use an implicit message to call the Reward Actor. This awards the miner their reward for having won a block, and also awards / penalizes them based on the message rewards and penalties we tracked.
We then proceed to apply the next block in
ts, using the same VM. This means that the state changes that result from applying a message are visible when applying all subsequent messages, even if they are included in a different block.
Having applied all the blocks, we send one more implicit message, to the Cron Actor, which handles operations that must be performed at the end of every epoch (see FIXME for more). The resulting state after calling the Cron Actor is the computed state of the tipset.
The Virtual Machine (VM) is responsible for executing messages. The Lotus Virtual Machine invokes the appropriate methods in the builtin actors, and provides a
Runtime interface to the builtin actors that exposes their state, allows them to take certain actions, and meters their gas usage. The VM also performs balance transfers, creates new account actors as needed, and tracks the gas reward, penalty, return value, and exit code.
The primary entrypoint of the VM is the
ApplyMessage() method. This method should not return an error unless something goes unrecoverably wrong.
The first thing this method does is assess if the message provided meets any of the penalty criteria. If so, a penalty is issued, and the method returns. Next, the entire gas cost of the message is transferred to a temporary gas holder account. It is from this gas holder that gas will be deducted; if it runs out of gas, the message fails. Any unused gas in this holder will be refunded to the message's sender at the end of message execution.
The VM then increments the sender's nonce, takes a snapshot of the state, and invokes
send() method creates a
Runtime for the subsequent message execution. It then transfers the message's value to the recipient, creating a new account actor if needed.
We use reflection to translate a Filecoin message for the VM to an actual Go function, relying on the VM's
invoker structure. Each actor has its own set of codes defined in
invoker structure maps the builtin actors' CIDs to a list of
invokeFunc (one per exported method), which each take the
Runtime (for state manipulation) and the serialized input parameters.
FIXME (aayush) Polish this next para.
The basic layout (without reflection details) of
(*invoker).transform() is as follows. From each actor registered in
NewInvoker() we take its
Exports() methods converting them to
invokeFuncs. The actual method is wrapped in another function that takes care of decoding the serialized parameters and the runtime, this function is passed to
shimCall() that will encapsulate the actors code being run inside a
defer function to
recover() from panics (we fail in the actors code with panics to unwrap the stack). The return values will then be (CBOR) marshaled and returned to the VM.
Once method invocation is complete (including any subcalls), we return to
ApplyMessage(), which receives the serialized response and the
ActorError. The sender will be charged the appropriate amount of gas for the returned response, which gets put into the
The method then refunds any unused gas to the sender, sets up the gas reward for the miner, and wraps all of this into an
ApplyRet, which is returned.
When we launch a Lotus node with the command
./lotus daemon (see here for more), the node is created through dependency injection. This relies on reflection, which makes some of the references hard to follow. The node sets up all of the subsystems it needs to run, such as the repository, the network connections, thechain sync service, etc. This setup is orchestrated through calls to the
node.Override function. The structure of each call indicates the type of component it will set up (many defined in
node/modules/dtypes/), and the function that will provide it. The dependency is implicit in the argument of the provider function.
As an example, consider the
modules.ChainStore() function that provides the
ChainStore structure. It takes as one of its parameters the
ChainBlockstore type, which becomes one of its dependencies. For the node to be built successfully the
ChainBlockstore will need to be provided before
ChainStore, a requirement that is made explicit in another
Override() call that sets the provider of that type as the
The repo is the directory where all of a node's information is stored. The node is entirely defined by its repo, which makes it easy to port to another location. This one-to-one relationship means we can speak of the node as the repo it is associated with, instead of the daemon process that runs from that repo.
Only one daemon can run be running with an associated repo at a time. A process signals that it is running a node associated with a particular repo, by creating and acquiring a
lsof ~/.lotus/repo.lock # COMMAND PID # lotus 52356
Trying to launch a second daemon hooked to the same repo leads to a
repo is already locked (lotus daemon already running) error.
node.Repo() function (
node/builder.go) contains most of the dependencies (specified as
Override() calls) needed to properly set up the node's repo. We list the most salient ones here.
ChainBlockstore: Data related to the node state is saved in the repo's
Datastore, an IPFS interface defined here. Lotus creates this interface from a Badger DB in
FsRepo. Every piece of data is fundamentally a key-value pair in the
datastore directory of the repo. There are several abstractions laid on top of it that appear through the code depending on how we access it, but it is important to remember that we're always accessing it from the same place.
FIXME: Maybe mention the
Batching interface as the developer will stumble upon it before reaching the
FIXME: IPFS blocks vs Filecoin blocks ideally happens before this / here
Blockstore interface structures the key-value pair into the CID format for the key and the
Block interface for the value. The
Block value is just a raw string of bytes addressed by its hash, which is included in the CID key.
ChainBlockstore creates a
Blockstore in the repo under the
/blocks namespace. Every key stored there will have the
blocks prefix so that it does not collide with other stores that use the same repo.
FIXME: Link to IPFS documentation about DAG, CID, and related, especially we need a diagram that shows how do we wrap each datastore inside the next layer (datastore, batching, block store, gc, etc).
modules.Datastore() creates a
dtypes.MetadataDS, which is an alias for the basic
Datastore interface. Metadata is stored here under the
/metadata prefix. (FIXME: Explain what is metadata in contrast with the block store, namely we store the pointer to the heaviest chain, we might just link to that unwritten section here later.)
FIXME: Explain the key store related calls (maybe remove, per Schomatis)
LockedRepo(): This method doesn't create or initialize any new structures, but rather registers an
OnStop hook that will close the locked repository associated with it on shutdown.
FIXME: This section needs to be clarified / corrected...I don't fully understand the config differences (what do they have in common, if anything?)
At the end of the
Repo() function we see two mutually exclusive configuration calls based on the
ApplyIf(isType(repo.FullNode), ConfigFullNode(c)), ApplyIf(isType(repo.StorageMiner), ConfigStorageMiner(c)),
As we said, the repo fully identifies the node so a repo type is also a node type, in this case a full node or a storage miner. (FIXME: What is the difference between the two, does full imply miner?) In this case the
daemon command will create a
FullNode, this is specified in the command logic itself in
FsRepo created (and passed to
node.Repo()) will be initiated with that type (see
FIXME: Much of this might need to be subsumed into the p2p section
node.Online() configuration function (
node/builder.go) initializes components that involve connecting to, or interacting with, the Filecoin network. These connections are managed through the libp2p stack (FIXME link to this section when it exists). We discuss some of the components found in the full node type (that is, included in the
modules.ChainStore() creates the
store.ChainStore) that wraps the stores previously instantiated in
Repo(). It is the main point of entry for the node to all chain-related data (FIXME: this is incorrect, we sometimes access its underlying block store directly, and probably shouldn't). It also holds the crucial
heaviest pointer, which indicates the current head of the chain.
ChainBlockservice() establish a BitSwap connection (FIXME libp2p link) to exchange chain information in the form of
blocks.Blocks stored in the repo. (See sync section for more details, the Filecoin blocks and messages are backed by these raw IPFS blocks that together form the different structures that define the state of the current/heaviest chain.)
HandleIncomingMessages() start the services in charge of processing new Filecoin blocks and messages from the network (see
<undefined> for more information about the topics the node is subscribed to, FIXME: should that be part of the libp2p section or should we expand on gossipsub separately?).
RunHello(): starts the services to both send (
(*Service).SayHello()) and receive (
hello messages. When nodes establish a new connection with each other, they exchange these messages to share chain-related information (namely their genesis block and their heaviest tipset).
NewSyncer() creates the
Syncer structure and starts the services related to the chain sync process (FIXME link).
We can establish the dependency relations by looking at the parameters that each function needs and by understanding the architecture of the node and how the different components relate to each other (the chief purpose of this document).
As an example, the sync mechanism depends on the node being able to exchange different IPFS blocks with the network, so as to be able to request the "missing pieces" needed to construct the chain. This dependency is reflected by
NewSyncer() having a
blocksync.BlockSync parameter, which in turn depends on
ChainExchange(). The chain exchange service further depends on the chain store to save and retrieve chain data, which is reflected in
ChainGCBlockstore as a parameter (which is just a wrapper around
ChainBlockstore capable of garbage collection).
This block store is the same store underlying the chain store, which is an indirect dependency of
NewSyncer() (through the
StateManager). (FIXME: This last line is flaky, we need to resolve the hierarchy better, we sometimes refer to the chain store and sometimes to its underlying block store. We need a diagram to visualize all the different components just mentioned otherwise it is too hard to follow. We probably even need to skip some of the connections mentioned.)